Better Together for Automated Repair of Web Pages - arXiv

11
Usability and Aesthetics: Better Together for Automated Repair of Web Pages Thanh Le-Cong School of Information and Communication Technology Hanoi University of Science and Technology Hanoi, Vietnam [email protected] Xuan Bach D. Le School of Computing and Information Systems University of Melbourne Melbourne, Australia [email protected] Quyet Thang Huynh and Phi Le Nguyen School of Information and Communication Technology Hanoi University of Science and Technology Hanoi, Vietnam {thanghq, lenp }@soict.hust.edu.vn Abstract—With the recent explosive growth of mobile devices such as smartphones or tablets, guaranteeing consistent web appearance across all environments has become a significant problem. This happens simply because it is hard to keep track of the web appearance on different sizes and types of devices that render the web pages. Therefore, fixing the inconsistent appearance of web pages can be difficult, and the cost incurred can be huge, e.g., poor user experience and financial loss due to it. Recently, automated web repair techniques have been proposed to automatically resolve inconsistent web page appearance, focusing on improving usability. However, generated patches tend to disrupt the webpage’s layout, rendering the repaired webpage aesthetically unpleasing, e.g., distorted images or misalignment of components. In this paper, we propose an automated repair approach for web pages based on meta-heuristic algorithms that can assure both usability and aesthetics. The key novelty that empowers our approach is a novel fitness function that allows us to optimistically evolve buggy web pages to find the best solution that optimizes both usability and aesthetics at the same time. Empirical eval- uations show that our approach is able to successfully resolve mobile-friendly problems in 94% of the evaluation subjects, significantly outperforming state-of-the-art baseline techniques in terms of both usability and aesthetics. Index Terms—Automated Program Repair, Search-based Soft- ware Engineering, Mobile Friendly Problem, Cross Browser Issues I. I NTRODUCTION In our modern world, the World Wide Web (WWW) has become one of the most popular sources of information [2], [5]. This information is often accessible through web pages via numerous types of devices such as smartphones and tablets and via various web browsers such as Chrome, Safari, Firefox, Internet Explorer, and many more. The variety of devices and browsers, on the one hand, allows various ways to access information, but on the other hand, poses several profound challenges, among which guaranteeing consistent web appearance across all environments is one of the most important problems. In practice, developers often have to ensure that web pages’ appearance is amenable to various environments. Specifically, there are two popular types of web appearance issues that Corresponding author developers usually concern with, including usability and aes- thetics of web pages across sizes and types of different devices. Usability issues include bugs related to font size, which affect readability, the margin of touchable elements, the presence of navigation or content that overflows the device’s viewport, etc. Aesthetic issues concern relative proportions and positioning of elements on the page. To ensure the absence of these issues, developers often have to test web pages and fix any discovered errors manually. This common practice, however, is very time- consuming and very expensive due to the enormous amount of environments that need to be considered. For example, these environments may refer to multiple browsers such as Chrome, Firefox, or multiple devices such as smartphones, tablets. Automated techniques that can help developers cope with automatic testing and repairing bugs on web pages are thus of tremendous value. Recent years have seen a pragmatic progress on automated web repair techniques proposed to resolve usability and aes- thetic issues on web pages [14], [15]. These approaches generally follow three stages: bugs discovery, localization, and repair. The first stage, i.e., bugs discovery, can be performed via readily available tools such as Google Mobile-Friendly Test Tool (GMFT) [3], Google PageSpeed Insights Tool (PSIT) [4] or Bing [1]. The latter two stages, i.e., localization and repair, are often more challenging with less tool support. Techniques like MFix [14] achieve this often via a genetic programming approach that gradually generates repair can- didates that improve usability over time. Although proposed repair algorithms can help to improve web pages’ appearance, since they mainly focus on usability while letting aesthetics be best-effort, they may not provide a patching solution that can guarantee good usability and aesthetics simultaneously. Machine-patched pages may suffer from a poor aesthetic and result in an undesirable layout. For example, Figure 1 depicts a repaired page by state-of-the-art MFix [14] across four different runs. From the figure, we can clearly see the instability and imbalance of MFix. The first two patches have layout distortion as some symbols and text are overlapped. Compared to the original page, the third patch does not change, meaning it does not manipulate the original page to reflect developers’ intent. The fourth patch is even more severely arXiv:2201.00117v1 [cs.SE] 1 Jan 2022

Transcript of Better Together for Automated Repair of Web Pages - arXiv

Usability and Aesthetics Better Together forAutomated Repair of Web Pages

Thanh Le-CongSchool of Information and

Communication TechnologyHanoi University of Science and Technology

Hanoi Vietnamthanhld164834sishusteduvn

Xuan Bach D LeSchool of Computing and

Information SystemsUniversity of Melbourne

Melbourne Australiabachleunimelbeduau

Quyet Thang Huynh dagger and Phi Le NguyenSchool of Information and

Communication TechnologyHanoi University of Science and Technology

Hanoi Vietnamthanghq lenp soicthusteduvn

AbstractmdashWith the recent explosive growth of mobile devicessuch as smartphones or tablets guaranteeing consistent webappearance across all environments has become a significantproblem This happens simply because it is hard to keep trackof the web appearance on different sizes and types of devicesthat render the web pages Therefore fixing the inconsistentappearance of web pages can be difficult and the cost incurredcan be huge eg poor user experience and financial loss due to itRecently automated web repair techniques have been proposed toautomatically resolve inconsistent web page appearance focusingon improving usability However generated patches tend todisrupt the webpagersquos layout rendering the repaired webpageaesthetically unpleasing eg distorted images or misalignmentof components

In this paper we propose an automated repair approach forweb pages based on meta-heuristic algorithms that can assureboth usability and aesthetics The key novelty that empowers ourapproach is a novel fitness function that allows us to optimisticallyevolve buggy web pages to find the best solution that optimizesboth usability and aesthetics at the same time Empirical eval-uations show that our approach is able to successfully resolvemobile-friendly problems in 94 of the evaluation subjectssignificantly outperforming state-of-the-art baseline techniquesin terms of both usability and aesthetics

Index TermsmdashAutomated Program Repair Search-based Soft-ware Engineering Mobile Friendly Problem Cross BrowserIssues

I INTRODUCTION

In our modern world the World Wide Web (WWW) hasbecome one of the most popular sources of information [2][5] This information is often accessible through web pagesvia numerous types of devices such as smartphones andtablets and via various web browsers such as Chrome SafariFirefox Internet Explorer and many more The variety ofdevices and browsers on the one hand allows various waysto access information but on the other hand poses severalprofound challenges among which guaranteeing consistentweb appearance across all environments is one of the mostimportant problems

In practice developers often have to ensure that web pagesrsquoappearance is amenable to various environments Specificallythere are two popular types of web appearance issues that

daggerCorresponding author

developers usually concern with including usability and aes-thetics of web pages across sizes and types of different devicesUsability issues include bugs related to font size which affectreadability the margin of touchable elements the presence ofnavigation or content that overflows the devicersquos viewport etcAesthetic issues concern relative proportions and positioningof elements on the page To ensure the absence of these issuesdevelopers often have to test web pages and fix any discoverederrors manually This common practice however is very time-consuming and very expensive due to the enormous amountof environments that need to be considered For examplethese environments may refer to multiple browsers such asChrome Firefox or multiple devices such as smartphonestablets Automated techniques that can help developers copewith automatic testing and repairing bugs on web pages arethus of tremendous value

Recent years have seen a pragmatic progress on automatedweb repair techniques proposed to resolve usability and aes-thetic issues on web pages [14] [15] These approachesgenerally follow three stages bugs discovery localization andrepair The first stage ie bugs discovery can be performedvia readily available tools such as Google Mobile-FriendlyTest Tool (GMFT) [3] Google PageSpeed Insights Tool (PSIT)[4] or Bing [1] The latter two stages ie localization andrepair are often more challenging with less tool supportTechniques like MFix [14] achieve this often via a geneticprogramming approach that gradually generates repair can-didates that improve usability over time Although proposedrepair algorithms can help to improve web pagesrsquo appearancesince they mainly focus on usability while letting aestheticsbe best-effort they may not provide a patching solution thatcan guarantee good usability and aesthetics simultaneously

Machine-patched pages may suffer from a poor aestheticand result in an undesirable layout For example Figure 1depicts a repaired page by state-of-the-art MFix [14] acrossfour different runs From the figure we can clearly see theinstability and imbalance of MFix The first two patches havelayout distortion as some symbols and text are overlappedCompared to the original page the third patch does not changemeaning it does not manipulate the original page to reflectdevelopersrsquo intent The fourth patch is even more severely

arX

iv2

201

0011

7v1

[cs

SE

] 1

Jan

202

2

problematic as it has cluttered navigationIn this work we focus on automated repair of web pages

considering both usability and aesthetics To assess usabilitywe use PSIT [4] which rates how friendly a web page isbased on a scale from 0 to 100 the higher the score themore usable the web page is To assess aesthetics we followa common metric proposed in [14] which calculates thealignment between components in a web page Based on thesemetrics we propose a new fitness function that optimizesboth the usability and aesthetics scores simultaneously Wepropose two repair algorithms to search for a solution that notonly addresses mobile friendliness problems but also ensuresa pleasing layout In the first algorithm we focus on theaccuracy of a solution with a high level of usability andaesthetics at a reasonable time cost In the second algorithmwe compensate the accuracy for the running time Specificallywe try to reduce the running time as much as possible whilemaintaining an acceptable level of usability and aesthetics Thefirst algorithm follows Particle Swarm Optimization (PSO) [9]and the second is based on Tabu Search [8] We compare ourapproach against MFix [14] ndash the state-of-the-art of mobile-friendly web repair Similar to MFix our algorithms generateand improve repair candidates over several generations Dif-ferent from MFix at the core of our algorithms we design afitness function that optimizes for both usability and aesthetics

We evaluated our approaches and MFix on 38 web pageswhich is provided by Mahajan et al [14] on two tasksmobile-friendliness (usability) and aesthetic layout (aesthet-ics) Experiment results show that our approaches significantlyoutperform MFix Notably our PSO-based repair algorithmoutperforms MFix on 29 of 38 web pages in the aestheticlayout task and on 26 of 38 web pages in the mobile-friendliness task Also our Tabu search-based algorithm showssuperior efficiency despite being less effective than the PSO-based algorithm that we proposed Finally we also evaluatedifferent parameter settings for PSO and Tabu Search toshow their impact Experiments with the Wilcoxon signed-rank test [25] show that the superiority of our approaches overMFix is significant

In summary our contributions includebull We propose a new fitness function that optimizes both

the usability and aesthetics scores to guarantee generatedsolutions that not only address mobile friendliness prob-lems but also ensure a pleasing layout

bull We proposed two algorithms namely PBRA and TERAto search for solutions based on the proposed fitnessfunction The first algorithm focuses on the accuracy ofsolutions while the second algorithm reduces the runningtime

bull We demonstrate that our approach significantly outper-forms the current state-of-the-art MFix on aesthetic layoutand mobile-friendliness tasks

The remainder of the paper is organized as follows SectionIII introduces the overview of an automated web repair pro-cess We present the mathematical formulation of the problemin Section IV and describe our proposed algorithms in Section

V Section VI presents the experimental results and SectionVIII concludes the paper

II RELATED WORK

Recently there has been a growing interest in automatedfixing of presentation issues in web pages Mahajan et alpropose XFix [15] technique that repairs presentation failuresarising from the inconsistencies in the rendering of a websiteacross different browsers ie layout Cross Browser Issues(XBIs) XFix assumes that one of the browsers presents thecorrect presentation of the page which must be rendered thesame in other browsers In this work we focus on the usabilityand aesthetic of web pages which is different from the XBIsaddressed by XFix That is there is no correct referenceavailable to the repair process in our approach Mahajan etal propose MFix [14] technique which is a novel automatedapproach for repairing mobile-friendly problems in web pagesThey also provided effective metrics to assess the aestheticusability of web pages However MFix mainly focuses onusability while letting the aesthetic best effort Mahajan et alrecently proposed IFix ++ [16] a search-based technique toautomatically repair Internationalization Presentation Failuresin web application However this work only fixes layoutdisruption while our work focuses on both usability and layoutdisruption at the same time

Cassius et al [20] proposes to use automated reasoning fordebugging and repairing buggy CSS The technique takes asinput faulty source lines in CSS files and user-provided exam-ples that can be used to guide the repair synthesis These inputsare however not readily available in the problem domain thatour work addresses PhpRepair [22] and PhpSync [19] detectand repair HTML syntax issues Wang et al [24] propose touse static and dynamic analysis together for repairing webapplications This is achieved by propagating a generatedpresentation fix to the server-side source code

Less relevant to our work in this paper is a recent researcheffort in automated repair of source code in Java or C program-ming languages Le Goues et al spark the exciting pioneeringwork on automated fixing of bugs in C programs [13] Theresearch in automated bug fixing has since then seen a plethoraof proposed techniques for fixing bugs in both C and Javaprograms using an array of techniques including search-basedsoftware engineering and formal methods etc [11] [12] [17][18] These approaches assume the existence of a test suiteto guard against the correctness of the programs under repairSimilar to the search-based program repair eg [12] [13]our work uses evolutionary search to find the best candidatesHowever different from them we do not rely on a test suite forpatch validation but instead we use readily available tools thatcan automatically assess usability and aesthetics to evaluaterepair candidates

Overall despite the recent interest in repairing web pagesthere are still more works needed to push the boundary ofthis research area further Our work attempts to enhance theexisting state-of-the-art work on web page repair The focus ofour work is on repairing web pages while preserving usability

(a) Run 1 (b) Run 2 (c) Run 3 (d) Run 4

Fig 1 Repaired page across several runs by MFix

and aesthetics altogether By doing this our work is quiteunique on its own to tackle the aforementioned problem Inthe next section we describe the details of our proposedframework

III OUR FRAMEWORK

Our repair framework consists of three phases segmen-tation localization and repair The main novelty of ourframework lies in the third phase ie repair while the first twophases are inspired by [14] Below we describe each phase inmore detail

The segmentation phase groups web pages into regionsEach region is a set of HTML elements that should be repairedtogether to guarantee the visual consistency of the repairedpage This can be achieved by readily available tools suchas VIPS [6] Block-o-Matic [23] correlation clustering [7]and cluster-based partition [21] In this work we leverage anautomated clustering-based partitioning algorithm proposed byRomero et al [21] as our segmentation tool This approachis based on the Document Object Model (DOM) tree of thepage in which each leaf element of the DOM tree is assignedto its segment Then the segments are merged if they are closeenough ie their distance measured by the average depth ofleaves in the DOM tree is below a predefined threshold

The localization phase consists of two steps The first stepis for classifying issues associated with problematic segmentsthat should be repaired There are five main types of issues in-cluding font-sizing tap target spacing content sizing viewportconfiguration and flash usage as reported by popular mobiletesting tools [4] [1] In this paper we only focus on addressingthe first three of these problems following MFix [14] Theresult of this step is a set of tuples with the form of 〈s T 〉where s is a potentially problematic segment and T is theset of issues associated with s Note that T is a subset offont-sizing tap target spacing content sizing To achievethis Google Mobile-Friendly Test Tool (GMFT) can be used

Next the second step determines CSS properties that mayneed to be adjusted for patching the issues of the problematicsegments For example a font-sizing issue may be patched byadjusting font-size line-height width height properties Asall the elements in the same segment should have a reciprocalrelationship their CSS properties should be adjusted togetherTo address this problem MFix [14] proposed a structurenamed Property Dependence Graph (PDG) which representsthe relationship between elements in a segment with respectto a given issue type Specifically each node of a PDGgraph corresponds to an HTML element in the segment Twonodes in the PDG are connected if there exists a dependencyrelationship between them There is a function M that mapseach edge of the PDG to the ratio of between the values ofthe CSS properties associating with the two end nodes of theedge Intuitively if we know the value corresponding to theroot node of a PDG we can deduce the value correspondingto the other nodes by using function M This results in a setof segments and their corresponding PDGs

The last phase ie repair is to determine the optimal valuesof CSS properties for repairing all the found issues In thispaper we focus on this phase and propose two algorithmswhich will be described in the following sections

IV FORMULATION AND FITNESS FUNCTION

The core of our algorithms is how we define a fitnessfunction to search for the best solution during the evolution ofrepair candidates A well-designed fitness function can leadto better repair candidates and thus we carefully craftedthe fitness function as explained below We first give thedetailed formula for the fitness function that we designed andthen explain the high-level idea following the design of thefunction

Given the potentially problematic segments their issuesand the corresponding PDG identified by the segmentationand localization algorithms described in the previous section

our objective is to determine the optimal patching solution toaddress all the issues Let S denote the set of all segments S =S1 Sn For each Si (i = 1 n) let Ii be the set of issuesassociated with Si Let Iji be an issue belonging to Ii thenj isin 1 2 3 where I1i I

2i I

3i correspond to issues related to

font-size tap target spacing and content sizing respectivelyLet PDG

langSi I

ji

rangdenote the PDG corresponding to Si and

Iji As mentioned in Section III to fix issue Iji we just needto determine the value of CSS properties associated with theroot node of PDG

langSi I

ji

rang(then the other elements can be

adjusted by using PDG) Our problem can be mathematicallyformulated as followsInput

Si Iji PDG

langSi I

ji

rang(i = 1 n j isin 1 2 3)

OutputX =

Xj

i

(i = 1 n j isin 1 2 3)

where Xji denotes the value of CSS properties associated

with the root node oflangSi I

ji

rang

Objectives

Maximize Usability score U(X)Minimize Aesthetic score A(X)

where U(X) indicates the mobile friendliness of the pagewhich can be measured by using PSIT the aesthetic scorequantifies the layout differences between the original pageand a transformed page to which a candidate patch has beenapplied The aesthetic score is measured based on SegmentModel (SM) and Intra-Segment Model (ISM) where

bull A Segment Model (SM) is defined as a directed com-plete graph where the nodes are the segments in thesegmentation phase and the edges are labeled by one offour relationships (1) intersection (2) containment (3)directional (ie above below left and right)

bull A Intra-segment Model (ISM) is similar to SM but is builtfor each segment and the nodes are the HTML elementswithin the segment

Using these models the aesthetic score is computed bysumming the size of the symmetric difference between eachedgersquos labels in the SM and ISM of the original page andtransformed page [14]

This is a multi-objective optimization problem and thus wedefine the total fitness function as follows

F (X) = αtimes E(U(X)minus 80) + βA(X)

where α and β are negative and positive weight factors thatcould be tuned to obtain the best performance and E(Y ) isa function defined as

E(Y ) =

G(Y ) if Y lt 0Y minus 1 otherwise

We explain the high-level idea of the above fitness functionas follows First it is worth noting that the maximum mobile-friendliness score given by Google Insight Tool is 100 andthat the usability score should be at least 80 to guaranteethe pagersquos friendliness as indicated in [14] Motivated by thisobservation we design the first item in the fitness function(ie the one representing the usability score) such that itsvalue is significantly large when the usability score is smallerthan the threshold which is 80 Therefore G(Y ) should bea reciprocal function that increases rapidly compared to alinear function The selection of G(Y ) is discussed in detailin Section VI

Note that as α is negative real number and β is positive realnumber maximizing U(X) and minimizing A(X) correspondto minimizing F (X) Accordingly the objective now is tominimize the fitness function F (X)

V PROPOSED ALGORITHMS

The core of our method is a Particle Swarm Optimizationthat repairs web pages by determining the optimal values ofthe fitness function as proposed in Section IV PSO iterativelytries to improve candidate solutions here dubbed particles bymoving these particles around in the search space accordingto simple mathematical formulae over the particlesrsquo positionand velocity Using PSO instead of random searches helpsto leverage knowledge about the changing trends of particlesimproving and directing particles in the next iterations PSOoften provides high-quality solutions but it may have a low-convergence rate which can be time-consuming To furthertackle this issue we also propose a second approach whichoptimizes for running time This second approach is based onthe Tabu search algorithm which shows a higher convergencerate than PSO as will be discussed in Section VI

In the following we describe the proposed algorithms indetail

A PBRA A PSO Based Repair Algorithm

In this section we leverage Particle Swarm Optimizationapproach to determine the optimal values for CSS propertiesthat need to be adjusted Basically PSO approach simulatesthe behaviours of bird flocking In PSO every single solutioncan be seen as a bird and we call it as a particle Thegoodness of every particle is measured by a fitness valuebased on a fitness function All of the particles have velocitiesthat direct the flying of the particles by following the currentoptimum particles PSO starts with a set of random particlesand searches for the optima by updating generations In eachiteration each particle is updated by following the best valuesfound so far Algorithm 1 presents the details of our proposedalgorithm which we will describe more in detail below

Let us assume that we have a set of m particles denotedas P = P1 PT The algorithm then follows the stepsbelow

bull Step 1 The first step in the process of a Particle SwarmOptimization is the generation of an initial population

Algorithm 1 PBRA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDG 〈Si I

ji 〉(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGs1 xlarr a candidate is suggested by GMFT [3]2 Initializing the first generation consisting of m particlesP0 =

P 01 P

02 P

0T

by using Gaussian distribution

3 gBestlarr F (P 01 )

4 for P 0t isin P0 do

5 v0(Pt)larr random number in (0 1)6 pBestt larr P 0

t

7 if F (pBestt) le F (gBest) then8 gBestlarr pBestu9 end if

10 end for11 k larr 012 while The termination condition has not been met do13 for P k

t isin P k do14 vk+1(Pt)larr Update(vk(Pt) pBestt gBest P

kt )

15 P k+1t = P k

t + vk+1(Pt)16 if F (P k+1

t ) lt F (pBestt) then17 pBestt larr P k+1

t

18 end if19 if F (pBestt) le F (gBest) then20 gBestlarr pBestt21 end if22 end fork larr k + 123 end while24 return gBest

Each particle of the initial population represents a possi-ble solution to the problem The initialization process isoften designed to ensure the diversity of the populationwhich plays an important role in population-based algo-rithms In mobile-friendly problems we create the firstparticle by using the suggested value of GMFT [3] Thenwe generate the initial population by perturbing adjust-ment factors based on a Gaussian distribution around theoriginal values of the first particle

bull Step 2 Let gBest denote the best value obtained so farby all particles in the population and let pBestt denotethe best value that the t-th particle (ie Pt) achievedso far If the current value of the t-th particle is betterthan pBestt then pBestt is replaced by the current valueof the t-th particle The algorithm then searches for allparticles to find the best candidate with the highest scoreand assigns it to gBest

bull Step 3 The algorithm leverage information obtainedfrom previous generations to improve the quality ofparticles by adjusting particles direction based on gBestand pBestt which is determined in step 2 In detailthe velocity and updating the value for every particle aredefined as follow The velocity of particle Pt at iteration

k + 1 (denoted as vk+1(Pt)) is determined as

vk+1(Pt) = w times vk(Pt) + c1 times r1 times(pBestt minus P k

t

)+ c2 times r2 times

(gBestminus P k

t

)where vk(Pt) is the velocity of particle Pi at the previousiteration Based on the velocity particle Pt updates itsvalue as follows

P k+1t = P k

t + vk+1(Pt)

bull Step 4 Step 2 is repeated until the termination conditionis satisfied The final value of gBest is then deemedas optimal solution In this work we used the numberof evaluations ie the number of calls to the fitnessfunctions as our termination condition

B TERA a Time Efficient Repair Algorithm

In this section we present the detail of a repair algorithmbased on Tabu search The insight of our algorithm is that westart with a repair candidate that is suggested by GMFT [3]Then we attempt to iteratively improve repair candidates overgenerations following a genetic programming approach [10]Algorithm 2 shows more details of our proposed algorithmWe explain Algorithm 2 step by step below

Algorithm 2 TERA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDGlt Si I

ji gt(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGsxlarr a candidate is suggested by GMFT [3]sBest larr xbestCandidate larr xtabuList larr [ ]tabuListpush(x)while termination condition dosNeighborhood larr getNeighbors(bestCandidate)for sCandidate isin sNeighborhood do

if F(sCandidate) le F(bestCandidate) andsCandidate isin Tabulist thenbestCandidate larr sCandidate

end ifend forif F(bestCandidate) le F(sBest) thensBest larr bestCandidate

end iftabuListpush(bestCandidate)if tabuListsize() gt sizetabu thentabuListremoveFirst()

end ifend whilereturn sBest

bull Step 1 First the algorithm creates a repair candidate issuggested by GMFT [3] assigning this candidate as thebest candidate denoted by sBest It then inserts sBest intoTabuList which is a list consisting of all the candidatesthat has been visited

bull Step 2 Let bestCandidate be the best repair candidatefound in the previous iteration and suppose that bestCan-didate is represented by

bestCandidate = x11 x21 x31 x1n x2n x3n

where xji is the value for repairing issue Ij of segmentxi The algorithm then determines neighbors of best-Candidate whose values fall into the following rangex11plusmnδ x21plusmnδ x31plusmnδ x1nplusmnδ x2nplusmnδ x3nplusmnδ Amongthe neighbors it then assigns the one with the highestfitness as bestCandidate of the current iteration If best-Candidate achieves a better fitness than the current sBestthen sBest is replaced by bestCandidate bestCandidateis then inserted into the TabuList

bull Step 3 Step 2 is repeated until the termination conditionis matched In this work we used the number of evalua-tions ie the number of calls to the fitness functions asour termination condition

C Implementation Detail

We implemented our approach in Java consisting of ap-proximately 5000 lines of code For identifying the mobile-friendly problems in a web page we used well-establishedtools such as GMFT [3] and PSIT [4] APIs For evaluatingaesthetics we identify segments in a web page and build theSegment Model and Intra-Segment Model by building a DOMtree following a similar method in [14] with Chrome browserv600 and Selenium WebDriver

VI EXPERIMENTAL RESULT

In this section we empirically compare our approachwith the well-known automated webpage repair algorithmMFix [14] on a dataset of real-world webpages Below wedescribe our experimental methodology including the data setand evaluation metrics followed by research questions andnumerical results

A Dataset and Evaluation metrics

To evaluate all algorithms we perform the experiments onthe dataset Alexa used in MFix [14] The dataset contains38 real-world subject web pages collected from the top 50most visited websites across all seventeen categories Table Ipresents the datasetrsquos details such as the URL and categoryof each web page The datasets can be publicly accessed inthe Github repository of MFix

dagger In terms of evaluation we

use two evaluation metrics proposed by SMahajan et al [14]bull Usability Score We use Google PageSpeed Insights Tool

[4] which assigns pages a score in the range of 0 to100 with 80 being an entirely mobile-friendly page toevaluate the usability of a web page The higher the scorethe better the usability of a web page is

bull Aesthetic Score The aesthetic score is a metric pro-posed in [14] for measuring how beautiful a web pagelooks Based on the assumption that the original web

daggerhttpsgithubcomUSC-SQLmfixtreemasterICSE paper datasubjects

pages have the aesthetic layout it is accounted for bythe relative visual positioning within the segments of apage such as Segment Model and Intra-Segment Modelbetween machine-patched and original web pages In thisway the lower the aesthetic score the more the pageis aesthetically pleasing Note that in this study wechose to use automated validation which measures thedifferences between the repaired page and the originalpage instead of a manual human to account for aestheticsAlthough we acknowledge the imperfection of automatedevaluation human evaluation is expensive eg the costto hire professional developers to ensure the quality ofthe manual evaluation and still this evaluation methodsuffers from human biases Automated validation on theother hand is less expensive and does not suffer fromhuman biases That is a reason why we choose automatedvalidation instead of manual human validation

For each web page we run TERA PBRA and MFIX eachfor ten times and calculate the average to mitigate the non-determinism inherent in the approximation algorithms To en-sure fairness between algorithms we also use the same numberof evaluations (number of fitness calls) for all algorithmsWe also perform Wilcoxon signed-rank statistical tests [25] toensure that the performance difference between our proposedsolutions and the baseline is statistically significant Ourexperiments were conducted on a Macbook Pro machine withan Intel Core i7 22 GHz and 16 GB of RAM running macOSMojave 10145

B Research Question

We answer four research questions as described below

RQ1 How effective is our approach in repairing mobile-friendly problems in web pages as compared to MFixIn this research question we investigate the effectiveness ofour approach as compared to MFix to generate repairs inthe benchmark dataset that we describe in Section VI-A Notethat our comparison relies on mobile-friendly tasks as well asaesthetic layout tasks

RQ2 How do different fitness functions impact on PBRAand TERA effectiveness By default our approach usesa fitness function with function G(Y ) is quadratic In thisresearch question we compare different fitness functions toexplain why we selected the quadratic function as default

RQ3 How efficient is TERA as compared to PBRA Inthis question we investigate the efficiency between TERA andPRBA by performing a comparison on their convergence rate

RQ4 How do different population sizes impact on PBRAeffectiveness By default our approach selects an initial popu-lation size of 10 for our approach In this research question wecompare different values of initial population size to explainwhy 10 is the best option

TABLE I Statistics of real-world web-pages

ID Url Categories ID Url Categories1 httpswwwdiscogscom Arts 20 httpswwwwowprogresscom Games2 httpsxkcdcom Arts 21 httpsbulbagardennet Kids and teens3 httpwwwwileycom Shopping 22 httplolcountercom Kids and teens4 httpforumgsmhostingcomvbb Home 23 httpwwwbomgovau Kids and teens5 httpswwwirsgov Home 24 httponlinelibrarywileycom Business6 httpstravelstategov Home 25 httpaamcorg Health7 httpsarxivorg Science 26 httpswwwfragranticacom Health8 httpsbitcointalkorg Science 27 httpusbattlenet Kids and teens9 httpcoinmarketcapcom Science 28 httpblizzardcom Kids and teens

10 httpwwwintellicastcom Science 29 httpdrudgereportcom News11 httpswwwncbinlmnihgov Science 30 httpswwwirctccoin Regional12 httpsigmaaldrichcom Science 31 httpdictcc Reference13 httpwwwweathergov Science 32 httpswwwleoorg Reference14 httpsboardgamegeekcom Games 33 httpcorreioscombr Society15 httpwwwfinalfantasyxivcom Game 34 httpwwwflashscorecom Sports16 httpwwwmmo-championcom Home 35 httpletourfr Sports17 httpwwwnexusmodscom Games 36 httprotoworldcom Sports18 httpnvidiacom Games 37 httpussoccerwaycom Sports19 httpwwwsquare- enixcom Games 38 httpmywaycom Computers

Fig 2 Comparison of TERA PBRA and MFix in Usability The higher the usablity score the more the mobile friendlyValues above the red horizontal line drawn at 80 indicate that the GMFT considers a page to be mobile friendly

TABLE II Default setting for PBRA

Parameters Symbol ValuePopulation size T 10Number of evaluations E 150Inertial coefficient (start) ωs 09Inertial coefficient (start) ωe 04Acceleration coefficient (persional) c1 05Acceleration coefficient (global) c2 05Coefficient of randomness r1 r2 [0 1]Neighbor range ∆ 03Fitness Function G(Y ) minusY 2

TABLE III Default setting for TERA

Parameters Symbol ValueNeighbor size sizeneighborhood 10Number of evaluations E 150Size of tabu list sizetabu 5Neighbor range δ 03Fitness Function G(Y ) minusY 2

C Numerical result

RQ1 Our approachrsquos effectiveness as compared toMFix To answer RQ1 we compare the effectiveness of ourapproach including PBRA and TERA (with default settings onTable II and Table III) with a well-known automated repair

Fig 3 Comparison of TERA and PBRA versus MFix in aesthetics The lower the aesthetic score the more the page isaesthetically pleasing Values below the red horizontal line drawn at 100 indicate improvements over baseline

(a) Usability (b) Aesthetics

Fig 4 Impact of fitness function on convergence rate

technique in mobile-friendly problems MFix [14] We alsoperformed statistical tests to study the significance of theimprovements by our approaches over the baseline Rationalebehind default settings on table II and table III will beexplained by RQ3 and RQ4

Experiment results show that PBRA outperforms MFix bygenerating repairs that have much better usability and aestheticscores Note that as we described in Section VI-A the higherthe usability score the better the usability of a web page andthe lower the aesthetic score the more aesthetically pleasinga web page is From Figure 2 we can clearly find that ourapproach namely PBRA outperforms MFix in 26 out of 38pages and achieves almost the same performance with MFixin other subject web pages The results show that PBRAis significantly better than TERA and MFix in the mobile-friendliness task

Figure 2 represents the usability of the repaired web pages

by using TERA PBRA and MFix The red line shows athreshold value of 80 - which is the minimum PSIT scoreto be considered for good usability by GMFT [3] We can seethat most of the repairs pass the PSIT threshold (36 out of 38subjects)

The numerical results concerning aesthetic score are de-picted in Figure 3 In Figure 3 we show the relative perfor-mance of our approaches versus the baseline technique MFixby dividing the aesthetics score of PBRA-patched and TERA-patched pages by that of the MFix-patched pages We do thisto normalize the results because of a wide range of aestheticsrsquovalue eg the aestheticsrsquo value can range from anywheregreater than zero Figure 3 shows that PBRA improves theaesthetic score significantly outperforming MFix on 29 outof 38 subjects in aesthetics TERA also shows considerablybetter results than MFix on 24 out of 38 subjects

We also perform Wilcoxon signed-rank test [25] to in-

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

problematic as it has cluttered navigationIn this work we focus on automated repair of web pages

considering both usability and aesthetics To assess usabilitywe use PSIT [4] which rates how friendly a web page isbased on a scale from 0 to 100 the higher the score themore usable the web page is To assess aesthetics we followa common metric proposed in [14] which calculates thealignment between components in a web page Based on thesemetrics we propose a new fitness function that optimizesboth the usability and aesthetics scores simultaneously Wepropose two repair algorithms to search for a solution that notonly addresses mobile friendliness problems but also ensuresa pleasing layout In the first algorithm we focus on theaccuracy of a solution with a high level of usability andaesthetics at a reasonable time cost In the second algorithmwe compensate the accuracy for the running time Specificallywe try to reduce the running time as much as possible whilemaintaining an acceptable level of usability and aesthetics Thefirst algorithm follows Particle Swarm Optimization (PSO) [9]and the second is based on Tabu Search [8] We compare ourapproach against MFix [14] ndash the state-of-the-art of mobile-friendly web repair Similar to MFix our algorithms generateand improve repair candidates over several generations Dif-ferent from MFix at the core of our algorithms we design afitness function that optimizes for both usability and aesthetics

We evaluated our approaches and MFix on 38 web pageswhich is provided by Mahajan et al [14] on two tasksmobile-friendliness (usability) and aesthetic layout (aesthet-ics) Experiment results show that our approaches significantlyoutperform MFix Notably our PSO-based repair algorithmoutperforms MFix on 29 of 38 web pages in the aestheticlayout task and on 26 of 38 web pages in the mobile-friendliness task Also our Tabu search-based algorithm showssuperior efficiency despite being less effective than the PSO-based algorithm that we proposed Finally we also evaluatedifferent parameter settings for PSO and Tabu Search toshow their impact Experiments with the Wilcoxon signed-rank test [25] show that the superiority of our approaches overMFix is significant

In summary our contributions includebull We propose a new fitness function that optimizes both

the usability and aesthetics scores to guarantee generatedsolutions that not only address mobile friendliness prob-lems but also ensure a pleasing layout

bull We proposed two algorithms namely PBRA and TERAto search for solutions based on the proposed fitnessfunction The first algorithm focuses on the accuracy ofsolutions while the second algorithm reduces the runningtime

bull We demonstrate that our approach significantly outper-forms the current state-of-the-art MFix on aesthetic layoutand mobile-friendliness tasks

The remainder of the paper is organized as follows SectionIII introduces the overview of an automated web repair pro-cess We present the mathematical formulation of the problemin Section IV and describe our proposed algorithms in Section

V Section VI presents the experimental results and SectionVIII concludes the paper

II RELATED WORK

Recently there has been a growing interest in automatedfixing of presentation issues in web pages Mahajan et alpropose XFix [15] technique that repairs presentation failuresarising from the inconsistencies in the rendering of a websiteacross different browsers ie layout Cross Browser Issues(XBIs) XFix assumes that one of the browsers presents thecorrect presentation of the page which must be rendered thesame in other browsers In this work we focus on the usabilityand aesthetic of web pages which is different from the XBIsaddressed by XFix That is there is no correct referenceavailable to the repair process in our approach Mahajan etal propose MFix [14] technique which is a novel automatedapproach for repairing mobile-friendly problems in web pagesThey also provided effective metrics to assess the aestheticusability of web pages However MFix mainly focuses onusability while letting the aesthetic best effort Mahajan et alrecently proposed IFix ++ [16] a search-based technique toautomatically repair Internationalization Presentation Failuresin web application However this work only fixes layoutdisruption while our work focuses on both usability and layoutdisruption at the same time

Cassius et al [20] proposes to use automated reasoning fordebugging and repairing buggy CSS The technique takes asinput faulty source lines in CSS files and user-provided exam-ples that can be used to guide the repair synthesis These inputsare however not readily available in the problem domain thatour work addresses PhpRepair [22] and PhpSync [19] detectand repair HTML syntax issues Wang et al [24] propose touse static and dynamic analysis together for repairing webapplications This is achieved by propagating a generatedpresentation fix to the server-side source code

Less relevant to our work in this paper is a recent researcheffort in automated repair of source code in Java or C program-ming languages Le Goues et al spark the exciting pioneeringwork on automated fixing of bugs in C programs [13] Theresearch in automated bug fixing has since then seen a plethoraof proposed techniques for fixing bugs in both C and Javaprograms using an array of techniques including search-basedsoftware engineering and formal methods etc [11] [12] [17][18] These approaches assume the existence of a test suiteto guard against the correctness of the programs under repairSimilar to the search-based program repair eg [12] [13]our work uses evolutionary search to find the best candidatesHowever different from them we do not rely on a test suite forpatch validation but instead we use readily available tools thatcan automatically assess usability and aesthetics to evaluaterepair candidates

Overall despite the recent interest in repairing web pagesthere are still more works needed to push the boundary ofthis research area further Our work attempts to enhance theexisting state-of-the-art work on web page repair The focus ofour work is on repairing web pages while preserving usability

(a) Run 1 (b) Run 2 (c) Run 3 (d) Run 4

Fig 1 Repaired page across several runs by MFix

and aesthetics altogether By doing this our work is quiteunique on its own to tackle the aforementioned problem Inthe next section we describe the details of our proposedframework

III OUR FRAMEWORK

Our repair framework consists of three phases segmen-tation localization and repair The main novelty of ourframework lies in the third phase ie repair while the first twophases are inspired by [14] Below we describe each phase inmore detail

The segmentation phase groups web pages into regionsEach region is a set of HTML elements that should be repairedtogether to guarantee the visual consistency of the repairedpage This can be achieved by readily available tools suchas VIPS [6] Block-o-Matic [23] correlation clustering [7]and cluster-based partition [21] In this work we leverage anautomated clustering-based partitioning algorithm proposed byRomero et al [21] as our segmentation tool This approachis based on the Document Object Model (DOM) tree of thepage in which each leaf element of the DOM tree is assignedto its segment Then the segments are merged if they are closeenough ie their distance measured by the average depth ofleaves in the DOM tree is below a predefined threshold

The localization phase consists of two steps The first stepis for classifying issues associated with problematic segmentsthat should be repaired There are five main types of issues in-cluding font-sizing tap target spacing content sizing viewportconfiguration and flash usage as reported by popular mobiletesting tools [4] [1] In this paper we only focus on addressingthe first three of these problems following MFix [14] Theresult of this step is a set of tuples with the form of 〈s T 〉where s is a potentially problematic segment and T is theset of issues associated with s Note that T is a subset offont-sizing tap target spacing content sizing To achievethis Google Mobile-Friendly Test Tool (GMFT) can be used

Next the second step determines CSS properties that mayneed to be adjusted for patching the issues of the problematicsegments For example a font-sizing issue may be patched byadjusting font-size line-height width height properties Asall the elements in the same segment should have a reciprocalrelationship their CSS properties should be adjusted togetherTo address this problem MFix [14] proposed a structurenamed Property Dependence Graph (PDG) which representsthe relationship between elements in a segment with respectto a given issue type Specifically each node of a PDGgraph corresponds to an HTML element in the segment Twonodes in the PDG are connected if there exists a dependencyrelationship between them There is a function M that mapseach edge of the PDG to the ratio of between the values ofthe CSS properties associating with the two end nodes of theedge Intuitively if we know the value corresponding to theroot node of a PDG we can deduce the value correspondingto the other nodes by using function M This results in a setof segments and their corresponding PDGs

The last phase ie repair is to determine the optimal valuesof CSS properties for repairing all the found issues In thispaper we focus on this phase and propose two algorithmswhich will be described in the following sections

IV FORMULATION AND FITNESS FUNCTION

The core of our algorithms is how we define a fitnessfunction to search for the best solution during the evolution ofrepair candidates A well-designed fitness function can leadto better repair candidates and thus we carefully craftedthe fitness function as explained below We first give thedetailed formula for the fitness function that we designed andthen explain the high-level idea following the design of thefunction

Given the potentially problematic segments their issuesand the corresponding PDG identified by the segmentationand localization algorithms described in the previous section

our objective is to determine the optimal patching solution toaddress all the issues Let S denote the set of all segments S =S1 Sn For each Si (i = 1 n) let Ii be the set of issuesassociated with Si Let Iji be an issue belonging to Ii thenj isin 1 2 3 where I1i I

2i I

3i correspond to issues related to

font-size tap target spacing and content sizing respectivelyLet PDG

langSi I

ji

rangdenote the PDG corresponding to Si and

Iji As mentioned in Section III to fix issue Iji we just needto determine the value of CSS properties associated with theroot node of PDG

langSi I

ji

rang(then the other elements can be

adjusted by using PDG) Our problem can be mathematicallyformulated as followsInput

Si Iji PDG

langSi I

ji

rang(i = 1 n j isin 1 2 3)

OutputX =

Xj

i

(i = 1 n j isin 1 2 3)

where Xji denotes the value of CSS properties associated

with the root node oflangSi I

ji

rang

Objectives

Maximize Usability score U(X)Minimize Aesthetic score A(X)

where U(X) indicates the mobile friendliness of the pagewhich can be measured by using PSIT the aesthetic scorequantifies the layout differences between the original pageand a transformed page to which a candidate patch has beenapplied The aesthetic score is measured based on SegmentModel (SM) and Intra-Segment Model (ISM) where

bull A Segment Model (SM) is defined as a directed com-plete graph where the nodes are the segments in thesegmentation phase and the edges are labeled by one offour relationships (1) intersection (2) containment (3)directional (ie above below left and right)

bull A Intra-segment Model (ISM) is similar to SM but is builtfor each segment and the nodes are the HTML elementswithin the segment

Using these models the aesthetic score is computed bysumming the size of the symmetric difference between eachedgersquos labels in the SM and ISM of the original page andtransformed page [14]

This is a multi-objective optimization problem and thus wedefine the total fitness function as follows

F (X) = αtimes E(U(X)minus 80) + βA(X)

where α and β are negative and positive weight factors thatcould be tuned to obtain the best performance and E(Y ) isa function defined as

E(Y ) =

G(Y ) if Y lt 0Y minus 1 otherwise

We explain the high-level idea of the above fitness functionas follows First it is worth noting that the maximum mobile-friendliness score given by Google Insight Tool is 100 andthat the usability score should be at least 80 to guaranteethe pagersquos friendliness as indicated in [14] Motivated by thisobservation we design the first item in the fitness function(ie the one representing the usability score) such that itsvalue is significantly large when the usability score is smallerthan the threshold which is 80 Therefore G(Y ) should bea reciprocal function that increases rapidly compared to alinear function The selection of G(Y ) is discussed in detailin Section VI

Note that as α is negative real number and β is positive realnumber maximizing U(X) and minimizing A(X) correspondto minimizing F (X) Accordingly the objective now is tominimize the fitness function F (X)

V PROPOSED ALGORITHMS

The core of our method is a Particle Swarm Optimizationthat repairs web pages by determining the optimal values ofthe fitness function as proposed in Section IV PSO iterativelytries to improve candidate solutions here dubbed particles bymoving these particles around in the search space accordingto simple mathematical formulae over the particlesrsquo positionand velocity Using PSO instead of random searches helpsto leverage knowledge about the changing trends of particlesimproving and directing particles in the next iterations PSOoften provides high-quality solutions but it may have a low-convergence rate which can be time-consuming To furthertackle this issue we also propose a second approach whichoptimizes for running time This second approach is based onthe Tabu search algorithm which shows a higher convergencerate than PSO as will be discussed in Section VI

In the following we describe the proposed algorithms indetail

A PBRA A PSO Based Repair Algorithm

In this section we leverage Particle Swarm Optimizationapproach to determine the optimal values for CSS propertiesthat need to be adjusted Basically PSO approach simulatesthe behaviours of bird flocking In PSO every single solutioncan be seen as a bird and we call it as a particle Thegoodness of every particle is measured by a fitness valuebased on a fitness function All of the particles have velocitiesthat direct the flying of the particles by following the currentoptimum particles PSO starts with a set of random particlesand searches for the optima by updating generations In eachiteration each particle is updated by following the best valuesfound so far Algorithm 1 presents the details of our proposedalgorithm which we will describe more in detail below

Let us assume that we have a set of m particles denotedas P = P1 PT The algorithm then follows the stepsbelow

bull Step 1 The first step in the process of a Particle SwarmOptimization is the generation of an initial population

Algorithm 1 PBRA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDG 〈Si I

ji 〉(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGs1 xlarr a candidate is suggested by GMFT [3]2 Initializing the first generation consisting of m particlesP0 =

P 01 P

02 P

0T

by using Gaussian distribution

3 gBestlarr F (P 01 )

4 for P 0t isin P0 do

5 v0(Pt)larr random number in (0 1)6 pBestt larr P 0

t

7 if F (pBestt) le F (gBest) then8 gBestlarr pBestu9 end if

10 end for11 k larr 012 while The termination condition has not been met do13 for P k

t isin P k do14 vk+1(Pt)larr Update(vk(Pt) pBestt gBest P

kt )

15 P k+1t = P k

t + vk+1(Pt)16 if F (P k+1

t ) lt F (pBestt) then17 pBestt larr P k+1

t

18 end if19 if F (pBestt) le F (gBest) then20 gBestlarr pBestt21 end if22 end fork larr k + 123 end while24 return gBest

Each particle of the initial population represents a possi-ble solution to the problem The initialization process isoften designed to ensure the diversity of the populationwhich plays an important role in population-based algo-rithms In mobile-friendly problems we create the firstparticle by using the suggested value of GMFT [3] Thenwe generate the initial population by perturbing adjust-ment factors based on a Gaussian distribution around theoriginal values of the first particle

bull Step 2 Let gBest denote the best value obtained so farby all particles in the population and let pBestt denotethe best value that the t-th particle (ie Pt) achievedso far If the current value of the t-th particle is betterthan pBestt then pBestt is replaced by the current valueof the t-th particle The algorithm then searches for allparticles to find the best candidate with the highest scoreand assigns it to gBest

bull Step 3 The algorithm leverage information obtainedfrom previous generations to improve the quality ofparticles by adjusting particles direction based on gBestand pBestt which is determined in step 2 In detailthe velocity and updating the value for every particle aredefined as follow The velocity of particle Pt at iteration

k + 1 (denoted as vk+1(Pt)) is determined as

vk+1(Pt) = w times vk(Pt) + c1 times r1 times(pBestt minus P k

t

)+ c2 times r2 times

(gBestminus P k

t

)where vk(Pt) is the velocity of particle Pi at the previousiteration Based on the velocity particle Pt updates itsvalue as follows

P k+1t = P k

t + vk+1(Pt)

bull Step 4 Step 2 is repeated until the termination conditionis satisfied The final value of gBest is then deemedas optimal solution In this work we used the numberof evaluations ie the number of calls to the fitnessfunctions as our termination condition

B TERA a Time Efficient Repair Algorithm

In this section we present the detail of a repair algorithmbased on Tabu search The insight of our algorithm is that westart with a repair candidate that is suggested by GMFT [3]Then we attempt to iteratively improve repair candidates overgenerations following a genetic programming approach [10]Algorithm 2 shows more details of our proposed algorithmWe explain Algorithm 2 step by step below

Algorithm 2 TERA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDGlt Si I

ji gt(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGsxlarr a candidate is suggested by GMFT [3]sBest larr xbestCandidate larr xtabuList larr [ ]tabuListpush(x)while termination condition dosNeighborhood larr getNeighbors(bestCandidate)for sCandidate isin sNeighborhood do

if F(sCandidate) le F(bestCandidate) andsCandidate isin Tabulist thenbestCandidate larr sCandidate

end ifend forif F(bestCandidate) le F(sBest) thensBest larr bestCandidate

end iftabuListpush(bestCandidate)if tabuListsize() gt sizetabu thentabuListremoveFirst()

end ifend whilereturn sBest

bull Step 1 First the algorithm creates a repair candidate issuggested by GMFT [3] assigning this candidate as thebest candidate denoted by sBest It then inserts sBest intoTabuList which is a list consisting of all the candidatesthat has been visited

bull Step 2 Let bestCandidate be the best repair candidatefound in the previous iteration and suppose that bestCan-didate is represented by

bestCandidate = x11 x21 x31 x1n x2n x3n

where xji is the value for repairing issue Ij of segmentxi The algorithm then determines neighbors of best-Candidate whose values fall into the following rangex11plusmnδ x21plusmnδ x31plusmnδ x1nplusmnδ x2nplusmnδ x3nplusmnδ Amongthe neighbors it then assigns the one with the highestfitness as bestCandidate of the current iteration If best-Candidate achieves a better fitness than the current sBestthen sBest is replaced by bestCandidate bestCandidateis then inserted into the TabuList

bull Step 3 Step 2 is repeated until the termination conditionis matched In this work we used the number of evalua-tions ie the number of calls to the fitness functions asour termination condition

C Implementation Detail

We implemented our approach in Java consisting of ap-proximately 5000 lines of code For identifying the mobile-friendly problems in a web page we used well-establishedtools such as GMFT [3] and PSIT [4] APIs For evaluatingaesthetics we identify segments in a web page and build theSegment Model and Intra-Segment Model by building a DOMtree following a similar method in [14] with Chrome browserv600 and Selenium WebDriver

VI EXPERIMENTAL RESULT

In this section we empirically compare our approachwith the well-known automated webpage repair algorithmMFix [14] on a dataset of real-world webpages Below wedescribe our experimental methodology including the data setand evaluation metrics followed by research questions andnumerical results

A Dataset and Evaluation metrics

To evaluate all algorithms we perform the experiments onthe dataset Alexa used in MFix [14] The dataset contains38 real-world subject web pages collected from the top 50most visited websites across all seventeen categories Table Ipresents the datasetrsquos details such as the URL and categoryof each web page The datasets can be publicly accessed inthe Github repository of MFix

dagger In terms of evaluation we

use two evaluation metrics proposed by SMahajan et al [14]bull Usability Score We use Google PageSpeed Insights Tool

[4] which assigns pages a score in the range of 0 to100 with 80 being an entirely mobile-friendly page toevaluate the usability of a web page The higher the scorethe better the usability of a web page is

bull Aesthetic Score The aesthetic score is a metric pro-posed in [14] for measuring how beautiful a web pagelooks Based on the assumption that the original web

daggerhttpsgithubcomUSC-SQLmfixtreemasterICSE paper datasubjects

pages have the aesthetic layout it is accounted for bythe relative visual positioning within the segments of apage such as Segment Model and Intra-Segment Modelbetween machine-patched and original web pages In thisway the lower the aesthetic score the more the pageis aesthetically pleasing Note that in this study wechose to use automated validation which measures thedifferences between the repaired page and the originalpage instead of a manual human to account for aestheticsAlthough we acknowledge the imperfection of automatedevaluation human evaluation is expensive eg the costto hire professional developers to ensure the quality ofthe manual evaluation and still this evaluation methodsuffers from human biases Automated validation on theother hand is less expensive and does not suffer fromhuman biases That is a reason why we choose automatedvalidation instead of manual human validation

For each web page we run TERA PBRA and MFIX eachfor ten times and calculate the average to mitigate the non-determinism inherent in the approximation algorithms To en-sure fairness between algorithms we also use the same numberof evaluations (number of fitness calls) for all algorithmsWe also perform Wilcoxon signed-rank statistical tests [25] toensure that the performance difference between our proposedsolutions and the baseline is statistically significant Ourexperiments were conducted on a Macbook Pro machine withan Intel Core i7 22 GHz and 16 GB of RAM running macOSMojave 10145

B Research Question

We answer four research questions as described below

RQ1 How effective is our approach in repairing mobile-friendly problems in web pages as compared to MFixIn this research question we investigate the effectiveness ofour approach as compared to MFix to generate repairs inthe benchmark dataset that we describe in Section VI-A Notethat our comparison relies on mobile-friendly tasks as well asaesthetic layout tasks

RQ2 How do different fitness functions impact on PBRAand TERA effectiveness By default our approach usesa fitness function with function G(Y ) is quadratic In thisresearch question we compare different fitness functions toexplain why we selected the quadratic function as default

RQ3 How efficient is TERA as compared to PBRA Inthis question we investigate the efficiency between TERA andPRBA by performing a comparison on their convergence rate

RQ4 How do different population sizes impact on PBRAeffectiveness By default our approach selects an initial popu-lation size of 10 for our approach In this research question wecompare different values of initial population size to explainwhy 10 is the best option

TABLE I Statistics of real-world web-pages

ID Url Categories ID Url Categories1 httpswwwdiscogscom Arts 20 httpswwwwowprogresscom Games2 httpsxkcdcom Arts 21 httpsbulbagardennet Kids and teens3 httpwwwwileycom Shopping 22 httplolcountercom Kids and teens4 httpforumgsmhostingcomvbb Home 23 httpwwwbomgovau Kids and teens5 httpswwwirsgov Home 24 httponlinelibrarywileycom Business6 httpstravelstategov Home 25 httpaamcorg Health7 httpsarxivorg Science 26 httpswwwfragranticacom Health8 httpsbitcointalkorg Science 27 httpusbattlenet Kids and teens9 httpcoinmarketcapcom Science 28 httpblizzardcom Kids and teens

10 httpwwwintellicastcom Science 29 httpdrudgereportcom News11 httpswwwncbinlmnihgov Science 30 httpswwwirctccoin Regional12 httpsigmaaldrichcom Science 31 httpdictcc Reference13 httpwwwweathergov Science 32 httpswwwleoorg Reference14 httpsboardgamegeekcom Games 33 httpcorreioscombr Society15 httpwwwfinalfantasyxivcom Game 34 httpwwwflashscorecom Sports16 httpwwwmmo-championcom Home 35 httpletourfr Sports17 httpwwwnexusmodscom Games 36 httprotoworldcom Sports18 httpnvidiacom Games 37 httpussoccerwaycom Sports19 httpwwwsquare- enixcom Games 38 httpmywaycom Computers

Fig 2 Comparison of TERA PBRA and MFix in Usability The higher the usablity score the more the mobile friendlyValues above the red horizontal line drawn at 80 indicate that the GMFT considers a page to be mobile friendly

TABLE II Default setting for PBRA

Parameters Symbol ValuePopulation size T 10Number of evaluations E 150Inertial coefficient (start) ωs 09Inertial coefficient (start) ωe 04Acceleration coefficient (persional) c1 05Acceleration coefficient (global) c2 05Coefficient of randomness r1 r2 [0 1]Neighbor range ∆ 03Fitness Function G(Y ) minusY 2

TABLE III Default setting for TERA

Parameters Symbol ValueNeighbor size sizeneighborhood 10Number of evaluations E 150Size of tabu list sizetabu 5Neighbor range δ 03Fitness Function G(Y ) minusY 2

C Numerical result

RQ1 Our approachrsquos effectiveness as compared toMFix To answer RQ1 we compare the effectiveness of ourapproach including PBRA and TERA (with default settings onTable II and Table III) with a well-known automated repair

Fig 3 Comparison of TERA and PBRA versus MFix in aesthetics The lower the aesthetic score the more the page isaesthetically pleasing Values below the red horizontal line drawn at 100 indicate improvements over baseline

(a) Usability (b) Aesthetics

Fig 4 Impact of fitness function on convergence rate

technique in mobile-friendly problems MFix [14] We alsoperformed statistical tests to study the significance of theimprovements by our approaches over the baseline Rationalebehind default settings on table II and table III will beexplained by RQ3 and RQ4

Experiment results show that PBRA outperforms MFix bygenerating repairs that have much better usability and aestheticscores Note that as we described in Section VI-A the higherthe usability score the better the usability of a web page andthe lower the aesthetic score the more aesthetically pleasinga web page is From Figure 2 we can clearly find that ourapproach namely PBRA outperforms MFix in 26 out of 38pages and achieves almost the same performance with MFixin other subject web pages The results show that PBRAis significantly better than TERA and MFix in the mobile-friendliness task

Figure 2 represents the usability of the repaired web pages

by using TERA PBRA and MFix The red line shows athreshold value of 80 - which is the minimum PSIT scoreto be considered for good usability by GMFT [3] We can seethat most of the repairs pass the PSIT threshold (36 out of 38subjects)

The numerical results concerning aesthetic score are de-picted in Figure 3 In Figure 3 we show the relative perfor-mance of our approaches versus the baseline technique MFixby dividing the aesthetics score of PBRA-patched and TERA-patched pages by that of the MFix-patched pages We do thisto normalize the results because of a wide range of aestheticsrsquovalue eg the aestheticsrsquo value can range from anywheregreater than zero Figure 3 shows that PBRA improves theaesthetic score significantly outperforming MFix on 29 outof 38 subjects in aesthetics TERA also shows considerablybetter results than MFix on 24 out of 38 subjects

We also perform Wilcoxon signed-rank test [25] to in-

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

(a) Run 1 (b) Run 2 (c) Run 3 (d) Run 4

Fig 1 Repaired page across several runs by MFix

and aesthetics altogether By doing this our work is quiteunique on its own to tackle the aforementioned problem Inthe next section we describe the details of our proposedframework

III OUR FRAMEWORK

Our repair framework consists of three phases segmen-tation localization and repair The main novelty of ourframework lies in the third phase ie repair while the first twophases are inspired by [14] Below we describe each phase inmore detail

The segmentation phase groups web pages into regionsEach region is a set of HTML elements that should be repairedtogether to guarantee the visual consistency of the repairedpage This can be achieved by readily available tools suchas VIPS [6] Block-o-Matic [23] correlation clustering [7]and cluster-based partition [21] In this work we leverage anautomated clustering-based partitioning algorithm proposed byRomero et al [21] as our segmentation tool This approachis based on the Document Object Model (DOM) tree of thepage in which each leaf element of the DOM tree is assignedto its segment Then the segments are merged if they are closeenough ie their distance measured by the average depth ofleaves in the DOM tree is below a predefined threshold

The localization phase consists of two steps The first stepis for classifying issues associated with problematic segmentsthat should be repaired There are five main types of issues in-cluding font-sizing tap target spacing content sizing viewportconfiguration and flash usage as reported by popular mobiletesting tools [4] [1] In this paper we only focus on addressingthe first three of these problems following MFix [14] Theresult of this step is a set of tuples with the form of 〈s T 〉where s is a potentially problematic segment and T is theset of issues associated with s Note that T is a subset offont-sizing tap target spacing content sizing To achievethis Google Mobile-Friendly Test Tool (GMFT) can be used

Next the second step determines CSS properties that mayneed to be adjusted for patching the issues of the problematicsegments For example a font-sizing issue may be patched byadjusting font-size line-height width height properties Asall the elements in the same segment should have a reciprocalrelationship their CSS properties should be adjusted togetherTo address this problem MFix [14] proposed a structurenamed Property Dependence Graph (PDG) which representsthe relationship between elements in a segment with respectto a given issue type Specifically each node of a PDGgraph corresponds to an HTML element in the segment Twonodes in the PDG are connected if there exists a dependencyrelationship between them There is a function M that mapseach edge of the PDG to the ratio of between the values ofthe CSS properties associating with the two end nodes of theedge Intuitively if we know the value corresponding to theroot node of a PDG we can deduce the value correspondingto the other nodes by using function M This results in a setof segments and their corresponding PDGs

The last phase ie repair is to determine the optimal valuesof CSS properties for repairing all the found issues In thispaper we focus on this phase and propose two algorithmswhich will be described in the following sections

IV FORMULATION AND FITNESS FUNCTION

The core of our algorithms is how we define a fitnessfunction to search for the best solution during the evolution ofrepair candidates A well-designed fitness function can leadto better repair candidates and thus we carefully craftedthe fitness function as explained below We first give thedetailed formula for the fitness function that we designed andthen explain the high-level idea following the design of thefunction

Given the potentially problematic segments their issuesand the corresponding PDG identified by the segmentationand localization algorithms described in the previous section

our objective is to determine the optimal patching solution toaddress all the issues Let S denote the set of all segments S =S1 Sn For each Si (i = 1 n) let Ii be the set of issuesassociated with Si Let Iji be an issue belonging to Ii thenj isin 1 2 3 where I1i I

2i I

3i correspond to issues related to

font-size tap target spacing and content sizing respectivelyLet PDG

langSi I

ji

rangdenote the PDG corresponding to Si and

Iji As mentioned in Section III to fix issue Iji we just needto determine the value of CSS properties associated with theroot node of PDG

langSi I

ji

rang(then the other elements can be

adjusted by using PDG) Our problem can be mathematicallyformulated as followsInput

Si Iji PDG

langSi I

ji

rang(i = 1 n j isin 1 2 3)

OutputX =

Xj

i

(i = 1 n j isin 1 2 3)

where Xji denotes the value of CSS properties associated

with the root node oflangSi I

ji

rang

Objectives

Maximize Usability score U(X)Minimize Aesthetic score A(X)

where U(X) indicates the mobile friendliness of the pagewhich can be measured by using PSIT the aesthetic scorequantifies the layout differences between the original pageand a transformed page to which a candidate patch has beenapplied The aesthetic score is measured based on SegmentModel (SM) and Intra-Segment Model (ISM) where

bull A Segment Model (SM) is defined as a directed com-plete graph where the nodes are the segments in thesegmentation phase and the edges are labeled by one offour relationships (1) intersection (2) containment (3)directional (ie above below left and right)

bull A Intra-segment Model (ISM) is similar to SM but is builtfor each segment and the nodes are the HTML elementswithin the segment

Using these models the aesthetic score is computed bysumming the size of the symmetric difference between eachedgersquos labels in the SM and ISM of the original page andtransformed page [14]

This is a multi-objective optimization problem and thus wedefine the total fitness function as follows

F (X) = αtimes E(U(X)minus 80) + βA(X)

where α and β are negative and positive weight factors thatcould be tuned to obtain the best performance and E(Y ) isa function defined as

E(Y ) =

G(Y ) if Y lt 0Y minus 1 otherwise

We explain the high-level idea of the above fitness functionas follows First it is worth noting that the maximum mobile-friendliness score given by Google Insight Tool is 100 andthat the usability score should be at least 80 to guaranteethe pagersquos friendliness as indicated in [14] Motivated by thisobservation we design the first item in the fitness function(ie the one representing the usability score) such that itsvalue is significantly large when the usability score is smallerthan the threshold which is 80 Therefore G(Y ) should bea reciprocal function that increases rapidly compared to alinear function The selection of G(Y ) is discussed in detailin Section VI

Note that as α is negative real number and β is positive realnumber maximizing U(X) and minimizing A(X) correspondto minimizing F (X) Accordingly the objective now is tominimize the fitness function F (X)

V PROPOSED ALGORITHMS

The core of our method is a Particle Swarm Optimizationthat repairs web pages by determining the optimal values ofthe fitness function as proposed in Section IV PSO iterativelytries to improve candidate solutions here dubbed particles bymoving these particles around in the search space accordingto simple mathematical formulae over the particlesrsquo positionand velocity Using PSO instead of random searches helpsto leverage knowledge about the changing trends of particlesimproving and directing particles in the next iterations PSOoften provides high-quality solutions but it may have a low-convergence rate which can be time-consuming To furthertackle this issue we also propose a second approach whichoptimizes for running time This second approach is based onthe Tabu search algorithm which shows a higher convergencerate than PSO as will be discussed in Section VI

In the following we describe the proposed algorithms indetail

A PBRA A PSO Based Repair Algorithm

In this section we leverage Particle Swarm Optimizationapproach to determine the optimal values for CSS propertiesthat need to be adjusted Basically PSO approach simulatesthe behaviours of bird flocking In PSO every single solutioncan be seen as a bird and we call it as a particle Thegoodness of every particle is measured by a fitness valuebased on a fitness function All of the particles have velocitiesthat direct the flying of the particles by following the currentoptimum particles PSO starts with a set of random particlesand searches for the optima by updating generations In eachiteration each particle is updated by following the best valuesfound so far Algorithm 1 presents the details of our proposedalgorithm which we will describe more in detail below

Let us assume that we have a set of m particles denotedas P = P1 PT The algorithm then follows the stepsbelow

bull Step 1 The first step in the process of a Particle SwarmOptimization is the generation of an initial population

Algorithm 1 PBRA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDG 〈Si I

ji 〉(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGs1 xlarr a candidate is suggested by GMFT [3]2 Initializing the first generation consisting of m particlesP0 =

P 01 P

02 P

0T

by using Gaussian distribution

3 gBestlarr F (P 01 )

4 for P 0t isin P0 do

5 v0(Pt)larr random number in (0 1)6 pBestt larr P 0

t

7 if F (pBestt) le F (gBest) then8 gBestlarr pBestu9 end if

10 end for11 k larr 012 while The termination condition has not been met do13 for P k

t isin P k do14 vk+1(Pt)larr Update(vk(Pt) pBestt gBest P

kt )

15 P k+1t = P k

t + vk+1(Pt)16 if F (P k+1

t ) lt F (pBestt) then17 pBestt larr P k+1

t

18 end if19 if F (pBestt) le F (gBest) then20 gBestlarr pBestt21 end if22 end fork larr k + 123 end while24 return gBest

Each particle of the initial population represents a possi-ble solution to the problem The initialization process isoften designed to ensure the diversity of the populationwhich plays an important role in population-based algo-rithms In mobile-friendly problems we create the firstparticle by using the suggested value of GMFT [3] Thenwe generate the initial population by perturbing adjust-ment factors based on a Gaussian distribution around theoriginal values of the first particle

bull Step 2 Let gBest denote the best value obtained so farby all particles in the population and let pBestt denotethe best value that the t-th particle (ie Pt) achievedso far If the current value of the t-th particle is betterthan pBestt then pBestt is replaced by the current valueof the t-th particle The algorithm then searches for allparticles to find the best candidate with the highest scoreand assigns it to gBest

bull Step 3 The algorithm leverage information obtainedfrom previous generations to improve the quality ofparticles by adjusting particles direction based on gBestand pBestt which is determined in step 2 In detailthe velocity and updating the value for every particle aredefined as follow The velocity of particle Pt at iteration

k + 1 (denoted as vk+1(Pt)) is determined as

vk+1(Pt) = w times vk(Pt) + c1 times r1 times(pBestt minus P k

t

)+ c2 times r2 times

(gBestminus P k

t

)where vk(Pt) is the velocity of particle Pi at the previousiteration Based on the velocity particle Pt updates itsvalue as follows

P k+1t = P k

t + vk+1(Pt)

bull Step 4 Step 2 is repeated until the termination conditionis satisfied The final value of gBest is then deemedas optimal solution In this work we used the numberof evaluations ie the number of calls to the fitnessfunctions as our termination condition

B TERA a Time Efficient Repair Algorithm

In this section we present the detail of a repair algorithmbased on Tabu search The insight of our algorithm is that westart with a repair candidate that is suggested by GMFT [3]Then we attempt to iteratively improve repair candidates overgenerations following a genetic programming approach [10]Algorithm 2 shows more details of our proposed algorithmWe explain Algorithm 2 step by step below

Algorithm 2 TERA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDGlt Si I

ji gt(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGsxlarr a candidate is suggested by GMFT [3]sBest larr xbestCandidate larr xtabuList larr [ ]tabuListpush(x)while termination condition dosNeighborhood larr getNeighbors(bestCandidate)for sCandidate isin sNeighborhood do

if F(sCandidate) le F(bestCandidate) andsCandidate isin Tabulist thenbestCandidate larr sCandidate

end ifend forif F(bestCandidate) le F(sBest) thensBest larr bestCandidate

end iftabuListpush(bestCandidate)if tabuListsize() gt sizetabu thentabuListremoveFirst()

end ifend whilereturn sBest

bull Step 1 First the algorithm creates a repair candidate issuggested by GMFT [3] assigning this candidate as thebest candidate denoted by sBest It then inserts sBest intoTabuList which is a list consisting of all the candidatesthat has been visited

bull Step 2 Let bestCandidate be the best repair candidatefound in the previous iteration and suppose that bestCan-didate is represented by

bestCandidate = x11 x21 x31 x1n x2n x3n

where xji is the value for repairing issue Ij of segmentxi The algorithm then determines neighbors of best-Candidate whose values fall into the following rangex11plusmnδ x21plusmnδ x31plusmnδ x1nplusmnδ x2nplusmnδ x3nplusmnδ Amongthe neighbors it then assigns the one with the highestfitness as bestCandidate of the current iteration If best-Candidate achieves a better fitness than the current sBestthen sBest is replaced by bestCandidate bestCandidateis then inserted into the TabuList

bull Step 3 Step 2 is repeated until the termination conditionis matched In this work we used the number of evalua-tions ie the number of calls to the fitness functions asour termination condition

C Implementation Detail

We implemented our approach in Java consisting of ap-proximately 5000 lines of code For identifying the mobile-friendly problems in a web page we used well-establishedtools such as GMFT [3] and PSIT [4] APIs For evaluatingaesthetics we identify segments in a web page and build theSegment Model and Intra-Segment Model by building a DOMtree following a similar method in [14] with Chrome browserv600 and Selenium WebDriver

VI EXPERIMENTAL RESULT

In this section we empirically compare our approachwith the well-known automated webpage repair algorithmMFix [14] on a dataset of real-world webpages Below wedescribe our experimental methodology including the data setand evaluation metrics followed by research questions andnumerical results

A Dataset and Evaluation metrics

To evaluate all algorithms we perform the experiments onthe dataset Alexa used in MFix [14] The dataset contains38 real-world subject web pages collected from the top 50most visited websites across all seventeen categories Table Ipresents the datasetrsquos details such as the URL and categoryof each web page The datasets can be publicly accessed inthe Github repository of MFix

dagger In terms of evaluation we

use two evaluation metrics proposed by SMahajan et al [14]bull Usability Score We use Google PageSpeed Insights Tool

[4] which assigns pages a score in the range of 0 to100 with 80 being an entirely mobile-friendly page toevaluate the usability of a web page The higher the scorethe better the usability of a web page is

bull Aesthetic Score The aesthetic score is a metric pro-posed in [14] for measuring how beautiful a web pagelooks Based on the assumption that the original web

daggerhttpsgithubcomUSC-SQLmfixtreemasterICSE paper datasubjects

pages have the aesthetic layout it is accounted for bythe relative visual positioning within the segments of apage such as Segment Model and Intra-Segment Modelbetween machine-patched and original web pages In thisway the lower the aesthetic score the more the pageis aesthetically pleasing Note that in this study wechose to use automated validation which measures thedifferences between the repaired page and the originalpage instead of a manual human to account for aestheticsAlthough we acknowledge the imperfection of automatedevaluation human evaluation is expensive eg the costto hire professional developers to ensure the quality ofthe manual evaluation and still this evaluation methodsuffers from human biases Automated validation on theother hand is less expensive and does not suffer fromhuman biases That is a reason why we choose automatedvalidation instead of manual human validation

For each web page we run TERA PBRA and MFIX eachfor ten times and calculate the average to mitigate the non-determinism inherent in the approximation algorithms To en-sure fairness between algorithms we also use the same numberof evaluations (number of fitness calls) for all algorithmsWe also perform Wilcoxon signed-rank statistical tests [25] toensure that the performance difference between our proposedsolutions and the baseline is statistically significant Ourexperiments were conducted on a Macbook Pro machine withan Intel Core i7 22 GHz and 16 GB of RAM running macOSMojave 10145

B Research Question

We answer four research questions as described below

RQ1 How effective is our approach in repairing mobile-friendly problems in web pages as compared to MFixIn this research question we investigate the effectiveness ofour approach as compared to MFix to generate repairs inthe benchmark dataset that we describe in Section VI-A Notethat our comparison relies on mobile-friendly tasks as well asaesthetic layout tasks

RQ2 How do different fitness functions impact on PBRAand TERA effectiveness By default our approach usesa fitness function with function G(Y ) is quadratic In thisresearch question we compare different fitness functions toexplain why we selected the quadratic function as default

RQ3 How efficient is TERA as compared to PBRA Inthis question we investigate the efficiency between TERA andPRBA by performing a comparison on their convergence rate

RQ4 How do different population sizes impact on PBRAeffectiveness By default our approach selects an initial popu-lation size of 10 for our approach In this research question wecompare different values of initial population size to explainwhy 10 is the best option

TABLE I Statistics of real-world web-pages

ID Url Categories ID Url Categories1 httpswwwdiscogscom Arts 20 httpswwwwowprogresscom Games2 httpsxkcdcom Arts 21 httpsbulbagardennet Kids and teens3 httpwwwwileycom Shopping 22 httplolcountercom Kids and teens4 httpforumgsmhostingcomvbb Home 23 httpwwwbomgovau Kids and teens5 httpswwwirsgov Home 24 httponlinelibrarywileycom Business6 httpstravelstategov Home 25 httpaamcorg Health7 httpsarxivorg Science 26 httpswwwfragranticacom Health8 httpsbitcointalkorg Science 27 httpusbattlenet Kids and teens9 httpcoinmarketcapcom Science 28 httpblizzardcom Kids and teens

10 httpwwwintellicastcom Science 29 httpdrudgereportcom News11 httpswwwncbinlmnihgov Science 30 httpswwwirctccoin Regional12 httpsigmaaldrichcom Science 31 httpdictcc Reference13 httpwwwweathergov Science 32 httpswwwleoorg Reference14 httpsboardgamegeekcom Games 33 httpcorreioscombr Society15 httpwwwfinalfantasyxivcom Game 34 httpwwwflashscorecom Sports16 httpwwwmmo-championcom Home 35 httpletourfr Sports17 httpwwwnexusmodscom Games 36 httprotoworldcom Sports18 httpnvidiacom Games 37 httpussoccerwaycom Sports19 httpwwwsquare- enixcom Games 38 httpmywaycom Computers

Fig 2 Comparison of TERA PBRA and MFix in Usability The higher the usablity score the more the mobile friendlyValues above the red horizontal line drawn at 80 indicate that the GMFT considers a page to be mobile friendly

TABLE II Default setting for PBRA

Parameters Symbol ValuePopulation size T 10Number of evaluations E 150Inertial coefficient (start) ωs 09Inertial coefficient (start) ωe 04Acceleration coefficient (persional) c1 05Acceleration coefficient (global) c2 05Coefficient of randomness r1 r2 [0 1]Neighbor range ∆ 03Fitness Function G(Y ) minusY 2

TABLE III Default setting for TERA

Parameters Symbol ValueNeighbor size sizeneighborhood 10Number of evaluations E 150Size of tabu list sizetabu 5Neighbor range δ 03Fitness Function G(Y ) minusY 2

C Numerical result

RQ1 Our approachrsquos effectiveness as compared toMFix To answer RQ1 we compare the effectiveness of ourapproach including PBRA and TERA (with default settings onTable II and Table III) with a well-known automated repair

Fig 3 Comparison of TERA and PBRA versus MFix in aesthetics The lower the aesthetic score the more the page isaesthetically pleasing Values below the red horizontal line drawn at 100 indicate improvements over baseline

(a) Usability (b) Aesthetics

Fig 4 Impact of fitness function on convergence rate

technique in mobile-friendly problems MFix [14] We alsoperformed statistical tests to study the significance of theimprovements by our approaches over the baseline Rationalebehind default settings on table II and table III will beexplained by RQ3 and RQ4

Experiment results show that PBRA outperforms MFix bygenerating repairs that have much better usability and aestheticscores Note that as we described in Section VI-A the higherthe usability score the better the usability of a web page andthe lower the aesthetic score the more aesthetically pleasinga web page is From Figure 2 we can clearly find that ourapproach namely PBRA outperforms MFix in 26 out of 38pages and achieves almost the same performance with MFixin other subject web pages The results show that PBRAis significantly better than TERA and MFix in the mobile-friendliness task

Figure 2 represents the usability of the repaired web pages

by using TERA PBRA and MFix The red line shows athreshold value of 80 - which is the minimum PSIT scoreto be considered for good usability by GMFT [3] We can seethat most of the repairs pass the PSIT threshold (36 out of 38subjects)

The numerical results concerning aesthetic score are de-picted in Figure 3 In Figure 3 we show the relative perfor-mance of our approaches versus the baseline technique MFixby dividing the aesthetics score of PBRA-patched and TERA-patched pages by that of the MFix-patched pages We do thisto normalize the results because of a wide range of aestheticsrsquovalue eg the aestheticsrsquo value can range from anywheregreater than zero Figure 3 shows that PBRA improves theaesthetic score significantly outperforming MFix on 29 outof 38 subjects in aesthetics TERA also shows considerablybetter results than MFix on 24 out of 38 subjects

We also perform Wilcoxon signed-rank test [25] to in-

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

our objective is to determine the optimal patching solution toaddress all the issues Let S denote the set of all segments S =S1 Sn For each Si (i = 1 n) let Ii be the set of issuesassociated with Si Let Iji be an issue belonging to Ii thenj isin 1 2 3 where I1i I

2i I

3i correspond to issues related to

font-size tap target spacing and content sizing respectivelyLet PDG

langSi I

ji

rangdenote the PDG corresponding to Si and

Iji As mentioned in Section III to fix issue Iji we just needto determine the value of CSS properties associated with theroot node of PDG

langSi I

ji

rang(then the other elements can be

adjusted by using PDG) Our problem can be mathematicallyformulated as followsInput

Si Iji PDG

langSi I

ji

rang(i = 1 n j isin 1 2 3)

OutputX =

Xj

i

(i = 1 n j isin 1 2 3)

where Xji denotes the value of CSS properties associated

with the root node oflangSi I

ji

rang

Objectives

Maximize Usability score U(X)Minimize Aesthetic score A(X)

where U(X) indicates the mobile friendliness of the pagewhich can be measured by using PSIT the aesthetic scorequantifies the layout differences between the original pageand a transformed page to which a candidate patch has beenapplied The aesthetic score is measured based on SegmentModel (SM) and Intra-Segment Model (ISM) where

bull A Segment Model (SM) is defined as a directed com-plete graph where the nodes are the segments in thesegmentation phase and the edges are labeled by one offour relationships (1) intersection (2) containment (3)directional (ie above below left and right)

bull A Intra-segment Model (ISM) is similar to SM but is builtfor each segment and the nodes are the HTML elementswithin the segment

Using these models the aesthetic score is computed bysumming the size of the symmetric difference between eachedgersquos labels in the SM and ISM of the original page andtransformed page [14]

This is a multi-objective optimization problem and thus wedefine the total fitness function as follows

F (X) = αtimes E(U(X)minus 80) + βA(X)

where α and β are negative and positive weight factors thatcould be tuned to obtain the best performance and E(Y ) isa function defined as

E(Y ) =

G(Y ) if Y lt 0Y minus 1 otherwise

We explain the high-level idea of the above fitness functionas follows First it is worth noting that the maximum mobile-friendliness score given by Google Insight Tool is 100 andthat the usability score should be at least 80 to guaranteethe pagersquos friendliness as indicated in [14] Motivated by thisobservation we design the first item in the fitness function(ie the one representing the usability score) such that itsvalue is significantly large when the usability score is smallerthan the threshold which is 80 Therefore G(Y ) should bea reciprocal function that increases rapidly compared to alinear function The selection of G(Y ) is discussed in detailin Section VI

Note that as α is negative real number and β is positive realnumber maximizing U(X) and minimizing A(X) correspondto minimizing F (X) Accordingly the objective now is tominimize the fitness function F (X)

V PROPOSED ALGORITHMS

The core of our method is a Particle Swarm Optimizationthat repairs web pages by determining the optimal values ofthe fitness function as proposed in Section IV PSO iterativelytries to improve candidate solutions here dubbed particles bymoving these particles around in the search space accordingto simple mathematical formulae over the particlesrsquo positionand velocity Using PSO instead of random searches helpsto leverage knowledge about the changing trends of particlesimproving and directing particles in the next iterations PSOoften provides high-quality solutions but it may have a low-convergence rate which can be time-consuming To furthertackle this issue we also propose a second approach whichoptimizes for running time This second approach is based onthe Tabu search algorithm which shows a higher convergencerate than PSO as will be discussed in Section VI

In the following we describe the proposed algorithms indetail

A PBRA A PSO Based Repair Algorithm

In this section we leverage Particle Swarm Optimizationapproach to determine the optimal values for CSS propertiesthat need to be adjusted Basically PSO approach simulatesthe behaviours of bird flocking In PSO every single solutioncan be seen as a bird and we call it as a particle Thegoodness of every particle is measured by a fitness valuebased on a fitness function All of the particles have velocitiesthat direct the flying of the particles by following the currentoptimum particles PSO starts with a set of random particlesand searches for the optima by updating generations In eachiteration each particle is updated by following the best valuesfound so far Algorithm 1 presents the details of our proposedalgorithm which we will describe more in detail below

Let us assume that we have a set of m particles denotedas P = P1 PT The algorithm then follows the stepsbelow

bull Step 1 The first step in the process of a Particle SwarmOptimization is the generation of an initial population

Algorithm 1 PBRA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDG 〈Si I

ji 〉(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGs1 xlarr a candidate is suggested by GMFT [3]2 Initializing the first generation consisting of m particlesP0 =

P 01 P

02 P

0T

by using Gaussian distribution

3 gBestlarr F (P 01 )

4 for P 0t isin P0 do

5 v0(Pt)larr random number in (0 1)6 pBestt larr P 0

t

7 if F (pBestt) le F (gBest) then8 gBestlarr pBestu9 end if

10 end for11 k larr 012 while The termination condition has not been met do13 for P k

t isin P k do14 vk+1(Pt)larr Update(vk(Pt) pBestt gBest P

kt )

15 P k+1t = P k

t + vk+1(Pt)16 if F (P k+1

t ) lt F (pBestt) then17 pBestt larr P k+1

t

18 end if19 if F (pBestt) le F (gBest) then20 gBestlarr pBestt21 end if22 end fork larr k + 123 end while24 return gBest

Each particle of the initial population represents a possi-ble solution to the problem The initialization process isoften designed to ensure the diversity of the populationwhich plays an important role in population-based algo-rithms In mobile-friendly problems we create the firstparticle by using the suggested value of GMFT [3] Thenwe generate the initial population by perturbing adjust-ment factors based on a Gaussian distribution around theoriginal values of the first particle

bull Step 2 Let gBest denote the best value obtained so farby all particles in the population and let pBestt denotethe best value that the t-th particle (ie Pt) achievedso far If the current value of the t-th particle is betterthan pBestt then pBestt is replaced by the current valueof the t-th particle The algorithm then searches for allparticles to find the best candidate with the highest scoreand assigns it to gBest

bull Step 3 The algorithm leverage information obtainedfrom previous generations to improve the quality ofparticles by adjusting particles direction based on gBestand pBestt which is determined in step 2 In detailthe velocity and updating the value for every particle aredefined as follow The velocity of particle Pt at iteration

k + 1 (denoted as vk+1(Pt)) is determined as

vk+1(Pt) = w times vk(Pt) + c1 times r1 times(pBestt minus P k

t

)+ c2 times r2 times

(gBestminus P k

t

)where vk(Pt) is the velocity of particle Pi at the previousiteration Based on the velocity particle Pt updates itsvalue as follows

P k+1t = P k

t + vk+1(Pt)

bull Step 4 Step 2 is repeated until the termination conditionis satisfied The final value of gBest is then deemedas optimal solution In this work we used the numberof evaluations ie the number of calls to the fitnessfunctions as our termination condition

B TERA a Time Efficient Repair Algorithm

In this section we present the detail of a repair algorithmbased on Tabu search The insight of our algorithm is that westart with a repair candidate that is suggested by GMFT [3]Then we attempt to iteratively improve repair candidates overgenerations following a genetic programming approach [10]Algorithm 2 shows more details of our proposed algorithmWe explain Algorithm 2 step by step below

Algorithm 2 TERA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDGlt Si I

ji gt(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGsxlarr a candidate is suggested by GMFT [3]sBest larr xbestCandidate larr xtabuList larr [ ]tabuListpush(x)while termination condition dosNeighborhood larr getNeighbors(bestCandidate)for sCandidate isin sNeighborhood do

if F(sCandidate) le F(bestCandidate) andsCandidate isin Tabulist thenbestCandidate larr sCandidate

end ifend forif F(bestCandidate) le F(sBest) thensBest larr bestCandidate

end iftabuListpush(bestCandidate)if tabuListsize() gt sizetabu thentabuListremoveFirst()

end ifend whilereturn sBest

bull Step 1 First the algorithm creates a repair candidate issuggested by GMFT [3] assigning this candidate as thebest candidate denoted by sBest It then inserts sBest intoTabuList which is a list consisting of all the candidatesthat has been visited

bull Step 2 Let bestCandidate be the best repair candidatefound in the previous iteration and suppose that bestCan-didate is represented by

bestCandidate = x11 x21 x31 x1n x2n x3n

where xji is the value for repairing issue Ij of segmentxi The algorithm then determines neighbors of best-Candidate whose values fall into the following rangex11plusmnδ x21plusmnδ x31plusmnδ x1nplusmnδ x2nplusmnδ x3nplusmnδ Amongthe neighbors it then assigns the one with the highestfitness as bestCandidate of the current iteration If best-Candidate achieves a better fitness than the current sBestthen sBest is replaced by bestCandidate bestCandidateis then inserted into the TabuList

bull Step 3 Step 2 is repeated until the termination conditionis matched In this work we used the number of evalua-tions ie the number of calls to the fitness functions asour termination condition

C Implementation Detail

We implemented our approach in Java consisting of ap-proximately 5000 lines of code For identifying the mobile-friendly problems in a web page we used well-establishedtools such as GMFT [3] and PSIT [4] APIs For evaluatingaesthetics we identify segments in a web page and build theSegment Model and Intra-Segment Model by building a DOMtree following a similar method in [14] with Chrome browserv600 and Selenium WebDriver

VI EXPERIMENTAL RESULT

In this section we empirically compare our approachwith the well-known automated webpage repair algorithmMFix [14] on a dataset of real-world webpages Below wedescribe our experimental methodology including the data setand evaluation metrics followed by research questions andnumerical results

A Dataset and Evaluation metrics

To evaluate all algorithms we perform the experiments onthe dataset Alexa used in MFix [14] The dataset contains38 real-world subject web pages collected from the top 50most visited websites across all seventeen categories Table Ipresents the datasetrsquos details such as the URL and categoryof each web page The datasets can be publicly accessed inthe Github repository of MFix

dagger In terms of evaluation we

use two evaluation metrics proposed by SMahajan et al [14]bull Usability Score We use Google PageSpeed Insights Tool

[4] which assigns pages a score in the range of 0 to100 with 80 being an entirely mobile-friendly page toevaluate the usability of a web page The higher the scorethe better the usability of a web page is

bull Aesthetic Score The aesthetic score is a metric pro-posed in [14] for measuring how beautiful a web pagelooks Based on the assumption that the original web

daggerhttpsgithubcomUSC-SQLmfixtreemasterICSE paper datasubjects

pages have the aesthetic layout it is accounted for bythe relative visual positioning within the segments of apage such as Segment Model and Intra-Segment Modelbetween machine-patched and original web pages In thisway the lower the aesthetic score the more the pageis aesthetically pleasing Note that in this study wechose to use automated validation which measures thedifferences between the repaired page and the originalpage instead of a manual human to account for aestheticsAlthough we acknowledge the imperfection of automatedevaluation human evaluation is expensive eg the costto hire professional developers to ensure the quality ofthe manual evaluation and still this evaluation methodsuffers from human biases Automated validation on theother hand is less expensive and does not suffer fromhuman biases That is a reason why we choose automatedvalidation instead of manual human validation

For each web page we run TERA PBRA and MFIX eachfor ten times and calculate the average to mitigate the non-determinism inherent in the approximation algorithms To en-sure fairness between algorithms we also use the same numberof evaluations (number of fitness calls) for all algorithmsWe also perform Wilcoxon signed-rank statistical tests [25] toensure that the performance difference between our proposedsolutions and the baseline is statistically significant Ourexperiments were conducted on a Macbook Pro machine withan Intel Core i7 22 GHz and 16 GB of RAM running macOSMojave 10145

B Research Question

We answer four research questions as described below

RQ1 How effective is our approach in repairing mobile-friendly problems in web pages as compared to MFixIn this research question we investigate the effectiveness ofour approach as compared to MFix to generate repairs inthe benchmark dataset that we describe in Section VI-A Notethat our comparison relies on mobile-friendly tasks as well asaesthetic layout tasks

RQ2 How do different fitness functions impact on PBRAand TERA effectiveness By default our approach usesa fitness function with function G(Y ) is quadratic In thisresearch question we compare different fitness functions toexplain why we selected the quadratic function as default

RQ3 How efficient is TERA as compared to PBRA Inthis question we investigate the efficiency between TERA andPRBA by performing a comparison on their convergence rate

RQ4 How do different population sizes impact on PBRAeffectiveness By default our approach selects an initial popu-lation size of 10 for our approach In this research question wecompare different values of initial population size to explainwhy 10 is the best option

TABLE I Statistics of real-world web-pages

ID Url Categories ID Url Categories1 httpswwwdiscogscom Arts 20 httpswwwwowprogresscom Games2 httpsxkcdcom Arts 21 httpsbulbagardennet Kids and teens3 httpwwwwileycom Shopping 22 httplolcountercom Kids and teens4 httpforumgsmhostingcomvbb Home 23 httpwwwbomgovau Kids and teens5 httpswwwirsgov Home 24 httponlinelibrarywileycom Business6 httpstravelstategov Home 25 httpaamcorg Health7 httpsarxivorg Science 26 httpswwwfragranticacom Health8 httpsbitcointalkorg Science 27 httpusbattlenet Kids and teens9 httpcoinmarketcapcom Science 28 httpblizzardcom Kids and teens

10 httpwwwintellicastcom Science 29 httpdrudgereportcom News11 httpswwwncbinlmnihgov Science 30 httpswwwirctccoin Regional12 httpsigmaaldrichcom Science 31 httpdictcc Reference13 httpwwwweathergov Science 32 httpswwwleoorg Reference14 httpsboardgamegeekcom Games 33 httpcorreioscombr Society15 httpwwwfinalfantasyxivcom Game 34 httpwwwflashscorecom Sports16 httpwwwmmo-championcom Home 35 httpletourfr Sports17 httpwwwnexusmodscom Games 36 httprotoworldcom Sports18 httpnvidiacom Games 37 httpussoccerwaycom Sports19 httpwwwsquare- enixcom Games 38 httpmywaycom Computers

Fig 2 Comparison of TERA PBRA and MFix in Usability The higher the usablity score the more the mobile friendlyValues above the red horizontal line drawn at 80 indicate that the GMFT considers a page to be mobile friendly

TABLE II Default setting for PBRA

Parameters Symbol ValuePopulation size T 10Number of evaluations E 150Inertial coefficient (start) ωs 09Inertial coefficient (start) ωe 04Acceleration coefficient (persional) c1 05Acceleration coefficient (global) c2 05Coefficient of randomness r1 r2 [0 1]Neighbor range ∆ 03Fitness Function G(Y ) minusY 2

TABLE III Default setting for TERA

Parameters Symbol ValueNeighbor size sizeneighborhood 10Number of evaluations E 150Size of tabu list sizetabu 5Neighbor range δ 03Fitness Function G(Y ) minusY 2

C Numerical result

RQ1 Our approachrsquos effectiveness as compared toMFix To answer RQ1 we compare the effectiveness of ourapproach including PBRA and TERA (with default settings onTable II and Table III) with a well-known automated repair

Fig 3 Comparison of TERA and PBRA versus MFix in aesthetics The lower the aesthetic score the more the page isaesthetically pleasing Values below the red horizontal line drawn at 100 indicate improvements over baseline

(a) Usability (b) Aesthetics

Fig 4 Impact of fitness function on convergence rate

technique in mobile-friendly problems MFix [14] We alsoperformed statistical tests to study the significance of theimprovements by our approaches over the baseline Rationalebehind default settings on table II and table III will beexplained by RQ3 and RQ4

Experiment results show that PBRA outperforms MFix bygenerating repairs that have much better usability and aestheticscores Note that as we described in Section VI-A the higherthe usability score the better the usability of a web page andthe lower the aesthetic score the more aesthetically pleasinga web page is From Figure 2 we can clearly find that ourapproach namely PBRA outperforms MFix in 26 out of 38pages and achieves almost the same performance with MFixin other subject web pages The results show that PBRAis significantly better than TERA and MFix in the mobile-friendliness task

Figure 2 represents the usability of the repaired web pages

by using TERA PBRA and MFix The red line shows athreshold value of 80 - which is the minimum PSIT scoreto be considered for good usability by GMFT [3] We can seethat most of the repairs pass the PSIT threshold (36 out of 38subjects)

The numerical results concerning aesthetic score are de-picted in Figure 3 In Figure 3 we show the relative perfor-mance of our approaches versus the baseline technique MFixby dividing the aesthetics score of PBRA-patched and TERA-patched pages by that of the MFix-patched pages We do thisto normalize the results because of a wide range of aestheticsrsquovalue eg the aestheticsrsquo value can range from anywheregreater than zero Figure 3 shows that PBRA improves theaesthetic score significantly outperforming MFix on 29 outof 38 subjects in aesthetics TERA also shows considerablybetter results than MFix on 24 out of 38 subjects

We also perform Wilcoxon signed-rank test [25] to in-

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

Algorithm 1 PBRA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDG 〈Si I

ji 〉(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGs1 xlarr a candidate is suggested by GMFT [3]2 Initializing the first generation consisting of m particlesP0 =

P 01 P

02 P

0T

by using Gaussian distribution

3 gBestlarr F (P 01 )

4 for P 0t isin P0 do

5 v0(Pt)larr random number in (0 1)6 pBestt larr P 0

t

7 if F (pBestt) le F (gBest) then8 gBestlarr pBestu9 end if

10 end for11 k larr 012 while The termination condition has not been met do13 for P k

t isin P k do14 vk+1(Pt)larr Update(vk(Pt) pBestt gBest P

kt )

15 P k+1t = P k

t + vk+1(Pt)16 if F (P k+1

t ) lt F (pBestt) then17 pBestt larr P k+1

t

18 end if19 if F (pBestt) le F (gBest) then20 gBestlarr pBestt21 end if22 end fork larr k + 123 end while24 return gBest

Each particle of the initial population represents a possi-ble solution to the problem The initialization process isoften designed to ensure the diversity of the populationwhich plays an important role in population-based algo-rithms In mobile-friendly problems we create the firstparticle by using the suggested value of GMFT [3] Thenwe generate the initial population by perturbing adjust-ment factors based on a Gaussian distribution around theoriginal values of the first particle

bull Step 2 Let gBest denote the best value obtained so farby all particles in the population and let pBestt denotethe best value that the t-th particle (ie Pt) achievedso far If the current value of the t-th particle is betterthan pBestt then pBestt is replaced by the current valueof the t-th particle The algorithm then searches for allparticles to find the best candidate with the highest scoreand assigns it to gBest

bull Step 3 The algorithm leverage information obtainedfrom previous generations to improve the quality ofparticles by adjusting particles direction based on gBestand pBestt which is determined in step 2 In detailthe velocity and updating the value for every particle aredefined as follow The velocity of particle Pt at iteration

k + 1 (denoted as vk+1(Pt)) is determined as

vk+1(Pt) = w times vk(Pt) + c1 times r1 times(pBestt minus P k

t

)+ c2 times r2 times

(gBestminus P k

t

)where vk(Pt) is the velocity of particle Pi at the previousiteration Based on the velocity particle Pt updates itsvalue as follows

P k+1t = P k

t + vk+1(Pt)

bull Step 4 Step 2 is repeated until the termination conditionis satisfied The final value of gBest is then deemedas optimal solution In this work we used the numberof evaluations ie the number of calls to the fitnessfunctions as our termination condition

B TERA a Time Efficient Repair Algorithm

In this section we present the detail of a repair algorithmbased on Tabu search The insight of our algorithm is that westart with a repair candidate that is suggested by GMFT [3]Then we attempt to iteratively improve repair candidates overgenerations following a genetic programming approach [10]Algorithm 2 shows more details of our proposed algorithmWe explain Algorithm 2 step by step below

Algorithm 2 TERA

Input The set of segments and their corresponding issuesand PDGs Si I

ji PDGlt Si I

ji gt(i = 1 n j isin 1 2 3)

Output Repair values for the root nodes of PDGsxlarr a candidate is suggested by GMFT [3]sBest larr xbestCandidate larr xtabuList larr [ ]tabuListpush(x)while termination condition dosNeighborhood larr getNeighbors(bestCandidate)for sCandidate isin sNeighborhood do

if F(sCandidate) le F(bestCandidate) andsCandidate isin Tabulist thenbestCandidate larr sCandidate

end ifend forif F(bestCandidate) le F(sBest) thensBest larr bestCandidate

end iftabuListpush(bestCandidate)if tabuListsize() gt sizetabu thentabuListremoveFirst()

end ifend whilereturn sBest

bull Step 1 First the algorithm creates a repair candidate issuggested by GMFT [3] assigning this candidate as thebest candidate denoted by sBest It then inserts sBest intoTabuList which is a list consisting of all the candidatesthat has been visited

bull Step 2 Let bestCandidate be the best repair candidatefound in the previous iteration and suppose that bestCan-didate is represented by

bestCandidate = x11 x21 x31 x1n x2n x3n

where xji is the value for repairing issue Ij of segmentxi The algorithm then determines neighbors of best-Candidate whose values fall into the following rangex11plusmnδ x21plusmnδ x31plusmnδ x1nplusmnδ x2nplusmnδ x3nplusmnδ Amongthe neighbors it then assigns the one with the highestfitness as bestCandidate of the current iteration If best-Candidate achieves a better fitness than the current sBestthen sBest is replaced by bestCandidate bestCandidateis then inserted into the TabuList

bull Step 3 Step 2 is repeated until the termination conditionis matched In this work we used the number of evalua-tions ie the number of calls to the fitness functions asour termination condition

C Implementation Detail

We implemented our approach in Java consisting of ap-proximately 5000 lines of code For identifying the mobile-friendly problems in a web page we used well-establishedtools such as GMFT [3] and PSIT [4] APIs For evaluatingaesthetics we identify segments in a web page and build theSegment Model and Intra-Segment Model by building a DOMtree following a similar method in [14] with Chrome browserv600 and Selenium WebDriver

VI EXPERIMENTAL RESULT

In this section we empirically compare our approachwith the well-known automated webpage repair algorithmMFix [14] on a dataset of real-world webpages Below wedescribe our experimental methodology including the data setand evaluation metrics followed by research questions andnumerical results

A Dataset and Evaluation metrics

To evaluate all algorithms we perform the experiments onthe dataset Alexa used in MFix [14] The dataset contains38 real-world subject web pages collected from the top 50most visited websites across all seventeen categories Table Ipresents the datasetrsquos details such as the URL and categoryof each web page The datasets can be publicly accessed inthe Github repository of MFix

dagger In terms of evaluation we

use two evaluation metrics proposed by SMahajan et al [14]bull Usability Score We use Google PageSpeed Insights Tool

[4] which assigns pages a score in the range of 0 to100 with 80 being an entirely mobile-friendly page toevaluate the usability of a web page The higher the scorethe better the usability of a web page is

bull Aesthetic Score The aesthetic score is a metric pro-posed in [14] for measuring how beautiful a web pagelooks Based on the assumption that the original web

daggerhttpsgithubcomUSC-SQLmfixtreemasterICSE paper datasubjects

pages have the aesthetic layout it is accounted for bythe relative visual positioning within the segments of apage such as Segment Model and Intra-Segment Modelbetween machine-patched and original web pages In thisway the lower the aesthetic score the more the pageis aesthetically pleasing Note that in this study wechose to use automated validation which measures thedifferences between the repaired page and the originalpage instead of a manual human to account for aestheticsAlthough we acknowledge the imperfection of automatedevaluation human evaluation is expensive eg the costto hire professional developers to ensure the quality ofthe manual evaluation and still this evaluation methodsuffers from human biases Automated validation on theother hand is less expensive and does not suffer fromhuman biases That is a reason why we choose automatedvalidation instead of manual human validation

For each web page we run TERA PBRA and MFIX eachfor ten times and calculate the average to mitigate the non-determinism inherent in the approximation algorithms To en-sure fairness between algorithms we also use the same numberof evaluations (number of fitness calls) for all algorithmsWe also perform Wilcoxon signed-rank statistical tests [25] toensure that the performance difference between our proposedsolutions and the baseline is statistically significant Ourexperiments were conducted on a Macbook Pro machine withan Intel Core i7 22 GHz and 16 GB of RAM running macOSMojave 10145

B Research Question

We answer four research questions as described below

RQ1 How effective is our approach in repairing mobile-friendly problems in web pages as compared to MFixIn this research question we investigate the effectiveness ofour approach as compared to MFix to generate repairs inthe benchmark dataset that we describe in Section VI-A Notethat our comparison relies on mobile-friendly tasks as well asaesthetic layout tasks

RQ2 How do different fitness functions impact on PBRAand TERA effectiveness By default our approach usesa fitness function with function G(Y ) is quadratic In thisresearch question we compare different fitness functions toexplain why we selected the quadratic function as default

RQ3 How efficient is TERA as compared to PBRA Inthis question we investigate the efficiency between TERA andPRBA by performing a comparison on their convergence rate

RQ4 How do different population sizes impact on PBRAeffectiveness By default our approach selects an initial popu-lation size of 10 for our approach In this research question wecompare different values of initial population size to explainwhy 10 is the best option

TABLE I Statistics of real-world web-pages

ID Url Categories ID Url Categories1 httpswwwdiscogscom Arts 20 httpswwwwowprogresscom Games2 httpsxkcdcom Arts 21 httpsbulbagardennet Kids and teens3 httpwwwwileycom Shopping 22 httplolcountercom Kids and teens4 httpforumgsmhostingcomvbb Home 23 httpwwwbomgovau Kids and teens5 httpswwwirsgov Home 24 httponlinelibrarywileycom Business6 httpstravelstategov Home 25 httpaamcorg Health7 httpsarxivorg Science 26 httpswwwfragranticacom Health8 httpsbitcointalkorg Science 27 httpusbattlenet Kids and teens9 httpcoinmarketcapcom Science 28 httpblizzardcom Kids and teens

10 httpwwwintellicastcom Science 29 httpdrudgereportcom News11 httpswwwncbinlmnihgov Science 30 httpswwwirctccoin Regional12 httpsigmaaldrichcom Science 31 httpdictcc Reference13 httpwwwweathergov Science 32 httpswwwleoorg Reference14 httpsboardgamegeekcom Games 33 httpcorreioscombr Society15 httpwwwfinalfantasyxivcom Game 34 httpwwwflashscorecom Sports16 httpwwwmmo-championcom Home 35 httpletourfr Sports17 httpwwwnexusmodscom Games 36 httprotoworldcom Sports18 httpnvidiacom Games 37 httpussoccerwaycom Sports19 httpwwwsquare- enixcom Games 38 httpmywaycom Computers

Fig 2 Comparison of TERA PBRA and MFix in Usability The higher the usablity score the more the mobile friendlyValues above the red horizontal line drawn at 80 indicate that the GMFT considers a page to be mobile friendly

TABLE II Default setting for PBRA

Parameters Symbol ValuePopulation size T 10Number of evaluations E 150Inertial coefficient (start) ωs 09Inertial coefficient (start) ωe 04Acceleration coefficient (persional) c1 05Acceleration coefficient (global) c2 05Coefficient of randomness r1 r2 [0 1]Neighbor range ∆ 03Fitness Function G(Y ) minusY 2

TABLE III Default setting for TERA

Parameters Symbol ValueNeighbor size sizeneighborhood 10Number of evaluations E 150Size of tabu list sizetabu 5Neighbor range δ 03Fitness Function G(Y ) minusY 2

C Numerical result

RQ1 Our approachrsquos effectiveness as compared toMFix To answer RQ1 we compare the effectiveness of ourapproach including PBRA and TERA (with default settings onTable II and Table III) with a well-known automated repair

Fig 3 Comparison of TERA and PBRA versus MFix in aesthetics The lower the aesthetic score the more the page isaesthetically pleasing Values below the red horizontal line drawn at 100 indicate improvements over baseline

(a) Usability (b) Aesthetics

Fig 4 Impact of fitness function on convergence rate

technique in mobile-friendly problems MFix [14] We alsoperformed statistical tests to study the significance of theimprovements by our approaches over the baseline Rationalebehind default settings on table II and table III will beexplained by RQ3 and RQ4

Experiment results show that PBRA outperforms MFix bygenerating repairs that have much better usability and aestheticscores Note that as we described in Section VI-A the higherthe usability score the better the usability of a web page andthe lower the aesthetic score the more aesthetically pleasinga web page is From Figure 2 we can clearly find that ourapproach namely PBRA outperforms MFix in 26 out of 38pages and achieves almost the same performance with MFixin other subject web pages The results show that PBRAis significantly better than TERA and MFix in the mobile-friendliness task

Figure 2 represents the usability of the repaired web pages

by using TERA PBRA and MFix The red line shows athreshold value of 80 - which is the minimum PSIT scoreto be considered for good usability by GMFT [3] We can seethat most of the repairs pass the PSIT threshold (36 out of 38subjects)

The numerical results concerning aesthetic score are de-picted in Figure 3 In Figure 3 we show the relative perfor-mance of our approaches versus the baseline technique MFixby dividing the aesthetics score of PBRA-patched and TERA-patched pages by that of the MFix-patched pages We do thisto normalize the results because of a wide range of aestheticsrsquovalue eg the aestheticsrsquo value can range from anywheregreater than zero Figure 3 shows that PBRA improves theaesthetic score significantly outperforming MFix on 29 outof 38 subjects in aesthetics TERA also shows considerablybetter results than MFix on 24 out of 38 subjects

We also perform Wilcoxon signed-rank test [25] to in-

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

bull Step 2 Let bestCandidate be the best repair candidatefound in the previous iteration and suppose that bestCan-didate is represented by

bestCandidate = x11 x21 x31 x1n x2n x3n

where xji is the value for repairing issue Ij of segmentxi The algorithm then determines neighbors of best-Candidate whose values fall into the following rangex11plusmnδ x21plusmnδ x31plusmnδ x1nplusmnδ x2nplusmnδ x3nplusmnδ Amongthe neighbors it then assigns the one with the highestfitness as bestCandidate of the current iteration If best-Candidate achieves a better fitness than the current sBestthen sBest is replaced by bestCandidate bestCandidateis then inserted into the TabuList

bull Step 3 Step 2 is repeated until the termination conditionis matched In this work we used the number of evalua-tions ie the number of calls to the fitness functions asour termination condition

C Implementation Detail

We implemented our approach in Java consisting of ap-proximately 5000 lines of code For identifying the mobile-friendly problems in a web page we used well-establishedtools such as GMFT [3] and PSIT [4] APIs For evaluatingaesthetics we identify segments in a web page and build theSegment Model and Intra-Segment Model by building a DOMtree following a similar method in [14] with Chrome browserv600 and Selenium WebDriver

VI EXPERIMENTAL RESULT

In this section we empirically compare our approachwith the well-known automated webpage repair algorithmMFix [14] on a dataset of real-world webpages Below wedescribe our experimental methodology including the data setand evaluation metrics followed by research questions andnumerical results

A Dataset and Evaluation metrics

To evaluate all algorithms we perform the experiments onthe dataset Alexa used in MFix [14] The dataset contains38 real-world subject web pages collected from the top 50most visited websites across all seventeen categories Table Ipresents the datasetrsquos details such as the URL and categoryof each web page The datasets can be publicly accessed inthe Github repository of MFix

dagger In terms of evaluation we

use two evaluation metrics proposed by SMahajan et al [14]bull Usability Score We use Google PageSpeed Insights Tool

[4] which assigns pages a score in the range of 0 to100 with 80 being an entirely mobile-friendly page toevaluate the usability of a web page The higher the scorethe better the usability of a web page is

bull Aesthetic Score The aesthetic score is a metric pro-posed in [14] for measuring how beautiful a web pagelooks Based on the assumption that the original web

daggerhttpsgithubcomUSC-SQLmfixtreemasterICSE paper datasubjects

pages have the aesthetic layout it is accounted for bythe relative visual positioning within the segments of apage such as Segment Model and Intra-Segment Modelbetween machine-patched and original web pages In thisway the lower the aesthetic score the more the pageis aesthetically pleasing Note that in this study wechose to use automated validation which measures thedifferences between the repaired page and the originalpage instead of a manual human to account for aestheticsAlthough we acknowledge the imperfection of automatedevaluation human evaluation is expensive eg the costto hire professional developers to ensure the quality ofthe manual evaluation and still this evaluation methodsuffers from human biases Automated validation on theother hand is less expensive and does not suffer fromhuman biases That is a reason why we choose automatedvalidation instead of manual human validation

For each web page we run TERA PBRA and MFIX eachfor ten times and calculate the average to mitigate the non-determinism inherent in the approximation algorithms To en-sure fairness between algorithms we also use the same numberof evaluations (number of fitness calls) for all algorithmsWe also perform Wilcoxon signed-rank statistical tests [25] toensure that the performance difference between our proposedsolutions and the baseline is statistically significant Ourexperiments were conducted on a Macbook Pro machine withan Intel Core i7 22 GHz and 16 GB of RAM running macOSMojave 10145

B Research Question

We answer four research questions as described below

RQ1 How effective is our approach in repairing mobile-friendly problems in web pages as compared to MFixIn this research question we investigate the effectiveness ofour approach as compared to MFix to generate repairs inthe benchmark dataset that we describe in Section VI-A Notethat our comparison relies on mobile-friendly tasks as well asaesthetic layout tasks

RQ2 How do different fitness functions impact on PBRAand TERA effectiveness By default our approach usesa fitness function with function G(Y ) is quadratic In thisresearch question we compare different fitness functions toexplain why we selected the quadratic function as default

RQ3 How efficient is TERA as compared to PBRA Inthis question we investigate the efficiency between TERA andPRBA by performing a comparison on their convergence rate

RQ4 How do different population sizes impact on PBRAeffectiveness By default our approach selects an initial popu-lation size of 10 for our approach In this research question wecompare different values of initial population size to explainwhy 10 is the best option

TABLE I Statistics of real-world web-pages

ID Url Categories ID Url Categories1 httpswwwdiscogscom Arts 20 httpswwwwowprogresscom Games2 httpsxkcdcom Arts 21 httpsbulbagardennet Kids and teens3 httpwwwwileycom Shopping 22 httplolcountercom Kids and teens4 httpforumgsmhostingcomvbb Home 23 httpwwwbomgovau Kids and teens5 httpswwwirsgov Home 24 httponlinelibrarywileycom Business6 httpstravelstategov Home 25 httpaamcorg Health7 httpsarxivorg Science 26 httpswwwfragranticacom Health8 httpsbitcointalkorg Science 27 httpusbattlenet Kids and teens9 httpcoinmarketcapcom Science 28 httpblizzardcom Kids and teens

10 httpwwwintellicastcom Science 29 httpdrudgereportcom News11 httpswwwncbinlmnihgov Science 30 httpswwwirctccoin Regional12 httpsigmaaldrichcom Science 31 httpdictcc Reference13 httpwwwweathergov Science 32 httpswwwleoorg Reference14 httpsboardgamegeekcom Games 33 httpcorreioscombr Society15 httpwwwfinalfantasyxivcom Game 34 httpwwwflashscorecom Sports16 httpwwwmmo-championcom Home 35 httpletourfr Sports17 httpwwwnexusmodscom Games 36 httprotoworldcom Sports18 httpnvidiacom Games 37 httpussoccerwaycom Sports19 httpwwwsquare- enixcom Games 38 httpmywaycom Computers

Fig 2 Comparison of TERA PBRA and MFix in Usability The higher the usablity score the more the mobile friendlyValues above the red horizontal line drawn at 80 indicate that the GMFT considers a page to be mobile friendly

TABLE II Default setting for PBRA

Parameters Symbol ValuePopulation size T 10Number of evaluations E 150Inertial coefficient (start) ωs 09Inertial coefficient (start) ωe 04Acceleration coefficient (persional) c1 05Acceleration coefficient (global) c2 05Coefficient of randomness r1 r2 [0 1]Neighbor range ∆ 03Fitness Function G(Y ) minusY 2

TABLE III Default setting for TERA

Parameters Symbol ValueNeighbor size sizeneighborhood 10Number of evaluations E 150Size of tabu list sizetabu 5Neighbor range δ 03Fitness Function G(Y ) minusY 2

C Numerical result

RQ1 Our approachrsquos effectiveness as compared toMFix To answer RQ1 we compare the effectiveness of ourapproach including PBRA and TERA (with default settings onTable II and Table III) with a well-known automated repair

Fig 3 Comparison of TERA and PBRA versus MFix in aesthetics The lower the aesthetic score the more the page isaesthetically pleasing Values below the red horizontal line drawn at 100 indicate improvements over baseline

(a) Usability (b) Aesthetics

Fig 4 Impact of fitness function on convergence rate

technique in mobile-friendly problems MFix [14] We alsoperformed statistical tests to study the significance of theimprovements by our approaches over the baseline Rationalebehind default settings on table II and table III will beexplained by RQ3 and RQ4

Experiment results show that PBRA outperforms MFix bygenerating repairs that have much better usability and aestheticscores Note that as we described in Section VI-A the higherthe usability score the better the usability of a web page andthe lower the aesthetic score the more aesthetically pleasinga web page is From Figure 2 we can clearly find that ourapproach namely PBRA outperforms MFix in 26 out of 38pages and achieves almost the same performance with MFixin other subject web pages The results show that PBRAis significantly better than TERA and MFix in the mobile-friendliness task

Figure 2 represents the usability of the repaired web pages

by using TERA PBRA and MFix The red line shows athreshold value of 80 - which is the minimum PSIT scoreto be considered for good usability by GMFT [3] We can seethat most of the repairs pass the PSIT threshold (36 out of 38subjects)

The numerical results concerning aesthetic score are de-picted in Figure 3 In Figure 3 we show the relative perfor-mance of our approaches versus the baseline technique MFixby dividing the aesthetics score of PBRA-patched and TERA-patched pages by that of the MFix-patched pages We do thisto normalize the results because of a wide range of aestheticsrsquovalue eg the aestheticsrsquo value can range from anywheregreater than zero Figure 3 shows that PBRA improves theaesthetic score significantly outperforming MFix on 29 outof 38 subjects in aesthetics TERA also shows considerablybetter results than MFix on 24 out of 38 subjects

We also perform Wilcoxon signed-rank test [25] to in-

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

TABLE I Statistics of real-world web-pages

ID Url Categories ID Url Categories1 httpswwwdiscogscom Arts 20 httpswwwwowprogresscom Games2 httpsxkcdcom Arts 21 httpsbulbagardennet Kids and teens3 httpwwwwileycom Shopping 22 httplolcountercom Kids and teens4 httpforumgsmhostingcomvbb Home 23 httpwwwbomgovau Kids and teens5 httpswwwirsgov Home 24 httponlinelibrarywileycom Business6 httpstravelstategov Home 25 httpaamcorg Health7 httpsarxivorg Science 26 httpswwwfragranticacom Health8 httpsbitcointalkorg Science 27 httpusbattlenet Kids and teens9 httpcoinmarketcapcom Science 28 httpblizzardcom Kids and teens

10 httpwwwintellicastcom Science 29 httpdrudgereportcom News11 httpswwwncbinlmnihgov Science 30 httpswwwirctccoin Regional12 httpsigmaaldrichcom Science 31 httpdictcc Reference13 httpwwwweathergov Science 32 httpswwwleoorg Reference14 httpsboardgamegeekcom Games 33 httpcorreioscombr Society15 httpwwwfinalfantasyxivcom Game 34 httpwwwflashscorecom Sports16 httpwwwmmo-championcom Home 35 httpletourfr Sports17 httpwwwnexusmodscom Games 36 httprotoworldcom Sports18 httpnvidiacom Games 37 httpussoccerwaycom Sports19 httpwwwsquare- enixcom Games 38 httpmywaycom Computers

Fig 2 Comparison of TERA PBRA and MFix in Usability The higher the usablity score the more the mobile friendlyValues above the red horizontal line drawn at 80 indicate that the GMFT considers a page to be mobile friendly

TABLE II Default setting for PBRA

Parameters Symbol ValuePopulation size T 10Number of evaluations E 150Inertial coefficient (start) ωs 09Inertial coefficient (start) ωe 04Acceleration coefficient (persional) c1 05Acceleration coefficient (global) c2 05Coefficient of randomness r1 r2 [0 1]Neighbor range ∆ 03Fitness Function G(Y ) minusY 2

TABLE III Default setting for TERA

Parameters Symbol ValueNeighbor size sizeneighborhood 10Number of evaluations E 150Size of tabu list sizetabu 5Neighbor range δ 03Fitness Function G(Y ) minusY 2

C Numerical result

RQ1 Our approachrsquos effectiveness as compared toMFix To answer RQ1 we compare the effectiveness of ourapproach including PBRA and TERA (with default settings onTable II and Table III) with a well-known automated repair

Fig 3 Comparison of TERA and PBRA versus MFix in aesthetics The lower the aesthetic score the more the page isaesthetically pleasing Values below the red horizontal line drawn at 100 indicate improvements over baseline

(a) Usability (b) Aesthetics

Fig 4 Impact of fitness function on convergence rate

technique in mobile-friendly problems MFix [14] We alsoperformed statistical tests to study the significance of theimprovements by our approaches over the baseline Rationalebehind default settings on table II and table III will beexplained by RQ3 and RQ4

Experiment results show that PBRA outperforms MFix bygenerating repairs that have much better usability and aestheticscores Note that as we described in Section VI-A the higherthe usability score the better the usability of a web page andthe lower the aesthetic score the more aesthetically pleasinga web page is From Figure 2 we can clearly find that ourapproach namely PBRA outperforms MFix in 26 out of 38pages and achieves almost the same performance with MFixin other subject web pages The results show that PBRAis significantly better than TERA and MFix in the mobile-friendliness task

Figure 2 represents the usability of the repaired web pages

by using TERA PBRA and MFix The red line shows athreshold value of 80 - which is the minimum PSIT scoreto be considered for good usability by GMFT [3] We can seethat most of the repairs pass the PSIT threshold (36 out of 38subjects)

The numerical results concerning aesthetic score are de-picted in Figure 3 In Figure 3 we show the relative perfor-mance of our approaches versus the baseline technique MFixby dividing the aesthetics score of PBRA-patched and TERA-patched pages by that of the MFix-patched pages We do thisto normalize the results because of a wide range of aestheticsrsquovalue eg the aestheticsrsquo value can range from anywheregreater than zero Figure 3 shows that PBRA improves theaesthetic score significantly outperforming MFix on 29 outof 38 subjects in aesthetics TERA also shows considerablybetter results than MFix on 24 out of 38 subjects

We also perform Wilcoxon signed-rank test [25] to in-

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

Fig 3 Comparison of TERA and PBRA versus MFix in aesthetics The lower the aesthetic score the more the page isaesthetically pleasing Values below the red horizontal line drawn at 100 indicate improvements over baseline

(a) Usability (b) Aesthetics

Fig 4 Impact of fitness function on convergence rate

technique in mobile-friendly problems MFix [14] We alsoperformed statistical tests to study the significance of theimprovements by our approaches over the baseline Rationalebehind default settings on table II and table III will beexplained by RQ3 and RQ4

Experiment results show that PBRA outperforms MFix bygenerating repairs that have much better usability and aestheticscores Note that as we described in Section VI-A the higherthe usability score the better the usability of a web page andthe lower the aesthetic score the more aesthetically pleasinga web page is From Figure 2 we can clearly find that ourapproach namely PBRA outperforms MFix in 26 out of 38pages and achieves almost the same performance with MFixin other subject web pages The results show that PBRAis significantly better than TERA and MFix in the mobile-friendliness task

Figure 2 represents the usability of the repaired web pages

by using TERA PBRA and MFix The red line shows athreshold value of 80 - which is the minimum PSIT scoreto be considered for good usability by GMFT [3] We can seethat most of the repairs pass the PSIT threshold (36 out of 38subjects)

The numerical results concerning aesthetic score are de-picted in Figure 3 In Figure 3 we show the relative perfor-mance of our approaches versus the baseline technique MFixby dividing the aesthetics score of PBRA-patched and TERA-patched pages by that of the MFix-patched pages We do thisto normalize the results because of a wide range of aestheticsrsquovalue eg the aestheticsrsquo value can range from anywheregreater than zero Figure 3 shows that PBRA improves theaesthetic score significantly outperforming MFix on 29 outof 38 subjects in aesthetics TERA also shows considerablybetter results than MFix on 24 out of 38 subjects

We also perform Wilcoxon signed-rank test [25] to in-

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

vestigate the performance difference between our approachesand MFix The results point out that PBRA is significantlybetter than both TERA and MFix in usability scores (with p-values of 14 e-07 and 0003 effect-size of 0627 and 0222respectively) PBRA and TERA also are better than MFix inaesthetics (with p-values of 0004 and 0041 effect-size of0991 and 0435 respectively)

RQ2 Impacts of the fitness function In Section IV weintroduce a novel fitness function to combine usability scoreand aesthetic score In the formula G(Y ) function plays animportant role in making adjustments to speed up the searchfor friendly patches However the G(Y ) function also needsto be designed to avoid ignoring potentially good candidatepatches

We have chosen some functions which satisfy the conditionsof G(Y) (in Section IV) to experiment with such as

bull Exponential function G(Y ) = eminusY

bull Quadradtic function G(Y ) = minusY 2

bull Cubic function G(Y ) = Y 3

We compare the effectiveness of the fitness functions on twotasks convergence rate ie how fast a fitness function helpsthe approaches to converge and overall result In the overalleffect from Figure 4 we can clearly find that the quadraticfunction is the most effective followed by the cubic functionand exponential function In convergence rate tasks Figure4 shows that our approach with exponential fitness functionconverges quickly with values exceeding 80

Indeed exponential fitness function can converge well justonly within a small number of evaluations of candidate solu-tions However when the number of evaluations is increasedthere is no significant improvement Our experiments showthat cubic and quadratic fitness functions need more com-putation time to achieve better results One potential reasoncould be that the exponential fitness function focuses onimproving usability score to exceed the threshold so that itignores potentially good candidate patches In conclusion thequadratic function has the best fitness function for both PBRAand TERA We therefore have chosen the quadratic functionas the default fitness function for our approach

RQ3 Convergence rates of PBRA and TERA The firstRQ has demonstrated that PBRA outperforms TERA (36 outof 38 on the mobile-friendliness test and 30 out of 38 onaesthetics) In this RQ we further investigate the convergencerates of both PBRA and TERA to assess the efficiency of thealgorithms

Our experiments on all pages suggest a similar trendas shown in Figure 5 which is the result for the pagehttpwwwwileycom Figure 5 depicts the values of usabil-ity score aesthetic score and fitness over generations Asshown TERA converges more quickly than PBRA but PBRAachieves better values than TERA

Concerning usability score TERA converges when the num-ber of evaluations is 50 while PBRA needs 150 evaluationsto converge (see Figure 5a) TERA only needs 30 evaluationsto reach score 80 - which is the threshold to pass GMFT [3]

while PBRA needs 50 evaluations This means that TERA andPBRA can guarantee a high level of friendliness for repairedpages but TERA consumes less running time

The convergence times of TERA and PBRA concerning theaesthetic score are similar as depicted in Figure 5b Both ofthem converge when the number of evaluations is 150 Theconverged value of PBRA is much smaller than that of TERASpecifically PBRA attains the aesthetic score that equals to86 that of TERA This result means that PBRA can providea better aesthetic than TERA

RQ4 Impacts of Population Size Population size playsan essential role in population-based meta-heuristic methodsA small population size may not be sufficient to find goodresults while a large population size may hamper the conver-gence rate Therefore we perform experiments with differentpopulation sizes on two tasks the convergence rate and overallresult Figure 6 shows the impact of the population size onthe convergence rate of the algorithm From the figure wecan clearly find that the population size of 10 is the mostsuitable value of the approach while the population size of5 and population size of 15 shows the worse result Smallpopulations are more likely to experience the loss of diversityover time because the mating between individuals with similargenetic structures occurred regularly Therefore the searchspace of the algorithm is narrowed which usually leads to poorresults like the result of the population size of 5 in this case Incontrast large populations could make the algorithm consumemore computation time in finding an optimal solution That isthe reason why algorithms with a population size of 15 havepoor results This observation is further reinforced through theexperiment on the overall result which is shown in Figure 7with five subjects and five different values of population size

VII THREATS TO VALIDITY

External Validity Threats to external validity correspondto the generalizability of our findings Our evaluation datasetcontains 38 real-world web pages Still these subjects maynot represent all possible cases in reality and thus our resultsmay not generalize We tried to mitigate this by selectingtop-ranked websites from various categories We plan toexperiment on a larger dataset in the future

Internal Validity Threats to internal validity refers to errorsin our implementation and experiments To mitigate this riskwe have carefully examined our code and experiments to avoidpotential errors We have also repeated our experiments severaltimes to ensure that the results reported are correct

Construct Validity Threats to construct validity correspondto the suitability of our evaluation metrics Following MFixwe consider web pages as mobile-friendly if it passes athreshold value of 80 which is the minimum PSIT score forgood usability These ratings are however subject to the PSITevaluation criteria only Although PSIT is a popular tool toassess usability we plan to also include other tools to morethoroughly assess usability

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

(a) Usability (b) Aesthetics

Fig 5 Comparison on covergence rate of TERA and PBRA

(a) Usability (b) Aesthetic

Fig 6 Impact of population size on covergence rate

VIII CONCLUSION

In this paper we introduced automated approaches basedon a meta-heuristic algorithm for repairing mobile-friendlyproblems in web pages Our approaches strive for improv-ing mobile-friendliness (usability) while minimizing layoutdisruption (aesthetic) at the same time To achieve this wedesigned a new novel fitness function to allow our searchalgorithms to find better repairs more efficiently and effec-tively Our evaluations on a data set of 38 real-world defectivewebpages show that our approaches generate good mobile-friendly patches for 94 of the subjects and significantlyoutperform existing state-of-the-art baseline technique MFixWe have also studied the impact of different fitness functionsand parameters of the meta-heuristic algorithm on the overallresult and convergence rate Overall experimental results are

promising and indicate the usefulness of our approacheswhich may support developers in designing mobile-friendlyweb pages

In the future we plan to improve the effectiveness andefficiency of our solution further by designing better fitnessfunction to better optimize for both usability and aestheticWe also plan to curate more data to expand our experimentson a larger dataset containing many more real-world projectsFurthermore we plan to integrate our approach into develop-ment pipeline to interactively help developers develop betterweb pages

IX ACKNOWLEDGEMENTS

We would like to thank Mahajan et al for sharing theirimplementation of MFix [14] and guiding us through theexperimental setup for MFix

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References

(a) Usability (b) Aesthetic

Fig 7 Impact of population size on overall result

REFERENCES

[1] ldquoBing Mobile Friendly Test Toolrdquo httpswwwbingcomwebmastertoolsmobile-friendliness 2017 retrieved March 2020

[2] ldquoeMarketer Releases Updated Estimates for USDigital Usersrdquo httpswwwemarketercomArticleeMarketer-Releases-Updated-Estimates-US-Digital-Users10152752017 retrieved March 2020

[3] ldquoGoogle Mobile Friendly Test Toolrdquo httpssearchgooglecomtestmobile-friendly 2017 retrieved March 2020

[4] ldquoGoogle PageSpeed Insights Toolrdquo httpsdevelopersgooglecomspeedpagespeedinsights 2017 retrieved March 2020

[5] ldquoGoogle 2018 Consumer Studyrdquo httpswwwconsumerbarometercomeninsightscountryCode=US 2018 retrieved March 2020

[6] M E Akpinar and Y Yesilada ldquoVision based page segmentationalgorithm Extended and perceived successrdquo in Revised Selected Papersof the ICWE 2013 International Workshops on Current Trends in WebEngineering - Volume 8295 Berlin Heidelberg Springer-Verlag 2013pp 238ndash252

[7] D Chakrabarti R Kumar and K Punera ldquoA graph-theoretic approachto webpage segmentationrdquo in Proceedings of the 17th InternationalConference on World Wide Web ser WWW rsquo08 New York NYUSA ACM 2008 pp 377ndash386

[8] F Glover ldquoTabu searchmdashpart irdquo ORSA Journal on computing vol 1no 3 pp 190ndash206 1989

[9] J Kennedy and R C Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Networks1995 pp 1942ndash1948

[10] J R Koza Genetic Programming On the Programming of Computersby Means of Natural Selection Cambridge MA USA MIT Press1992

[11] X-B D Le D-H Chu D Lo C Le Goues and W Visser ldquoS3 syntax-and semantic-guided repair synthesis via programming by examplesrdquo inProceedings of the 2017 11th Joint Meeting on Foundations of SoftwareEngineering 2017 pp 593ndash604

[12] X B D Le D Lo and C Le Goues ldquoHistory driven program repairrdquoin 2016 IEEE 23rd International Conference on Software AnalysisEvolution and Reengineering (SANER) vol 1 IEEE 2016 pp 213ndash224

[13] C Le Goues T Nguyen S Forrest and W Weimer ldquoGenprog Ageneric method for automatic software repairrdquo Ieee transactions onsoftware engineering vol 38 no 1 pp 54ndash72 2011

[14] S Mahajan N Abolhassani P McMinn and W G J Halfond ldquoAuto-mated repair of mobile friendly problems in web pagesrdquo in Proceedingsof the 40th International Conference on Software Engineering ser ICSErsquo18 New York NY USA ACM 2018 pp 140ndash150

[15] S Mahajan A Alameer P McMinn and W G J HalfondldquoAutomated repair of layout cross browser issues using search-basedtechniquesrdquo in Proceedings of the 26th ACM SIGSOFT InternationalSymposium on Software Testing and Analysis ser ISSTA 2017 NewYork NY USA ACM 2017 pp 249ndash260 [Online] Availablehttpdoiacmorg10114530927033092726

[16] S Mahajan A Alameer P McMinn and W G Halfond ldquoEffectiveautomated repair of internationalization presentation failures in web ap-plications using style similarity clustering and search-based techniquesrdquoSoftware Testing Verification and Reliability vol 31 no 1-2 p e17462021

[17] S Mechtaev J Yi and A Roychoudhury ldquoAngelix Scalable multilineprogram patch synthesis via symbolic analysisrdquo in Proceedings of the38th international conference on software engineering 2016 pp 691ndash701

[18] H D T Nguyen D Qi A Roychoudhury and S Chandra ldquoSemfixProgram repair via semantic analysisrdquo in 2013 35th InternationalConference on Software Engineering (ICSE) IEEE 2013 pp 772ndash781

[19] H V Nguyen H A Nguyen T T Nguyen and T N Nguyen ldquoAuto-locating and fix-propagating for html validation errors to php server-sidecoderdquo in 2011 26th IEEEACM International Conference on AutomatedSoftware Engineering (ASE 2011) IEEE 2011 pp 13ndash22

[20] P Panchekha and E Torlak ldquoAutomated reasoning for web page layoutrdquoin Proceedings of the 2016 ACM SIGPLAN International Conference onObject-Oriented Programming Systems Languages and Applications2016 pp 181ndash194

[21] R Romero and A Berger ldquoAutomatic partitioning of web pages usingclusteringrdquo in Mobile Human-Computer Interaction - MobileHCI 2004Berlin Heidelberg Springer Berlin Heidelberg 2004 pp 388ndash393

[22] H Samimi M Schafer S Artzi T Millstein F Tip and L HendrenldquoAutomated repair of html generation errors in php applications usingstring constraint solvingrdquo in 2012 34th International Conference onSoftware Engineering (ICSE) IEEE 2012 pp 277ndash287

[23] A Sanoja and S Gancarski ldquoBlock-o-matic A web page segmentationframeworkrdquo in 2014 International Conference on Multimedia Comput-ing and Systems (ICMCS) April 2014 pp 595ndash600

[24] X Wang L Zhang T Xie Y Xiong and H Mei ldquoAutomating pre-sentation changes in dynamic web applications via collaborative hybridanalysisrdquo in Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of Software Engineering 2012 pp 1ndash11

[25] F Wilcoxon Individual Comparisons by Ranking Methods New YorkNY Springer New York 1992 pp 196ndash202

  • I Introduction
  • II Related Work
  • III Our Framework
  • IV Formulation and Fitness Function
  • V Proposed Algorithms
    • V-A PBRA A pso Based Repair Algorithm
    • V-B TERA a Time Efficient Repair Algorithm
    • V-C Implementation Detail
      • VI Experimental Result
        • VI-A Dataset and Evaluation metrics
        • VI-B Research Question
        • VI-C Numerical result
          • VII Threats to validity
          • VIII Conclusion
          • IX Acknowledgements
          • References