Intelligent non-visual navigation of complex HTML structures

14
Univ Access Inf Soc (2002) 2: 56–69 / Digital Object Identifier (DOI) 10.1007/s10209-002-0036-4 Intelligent non-visual navigation of complex HTML structures E. Pontelli 1 , D. Gillan 2 , G. Gupta 3 , A. Karshmer 4 , E. Saad 1 , W. Xiong 1 1 Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA; E-mail: [email protected] 2 Department of Psychology, New Mexico State University, Las Cruces, NM 88003, USA; E-mail: [email protected] 3 Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA; E-mail: [email protected] 4 Department of Information Technology, University of South Florida, Lakeland, FL, USA; E-mail: [email protected] Published online: 6 November 2002 – Springer-Verlag 2002 Abstract. This paper provides an overview of a project aimed at using knowledge-based technology to improve accessibility of the Web for visually impaired users. The focus is on the multi-dimensional components of Web pages (tables and frames); our cognitive studies demon- strate that spatial information is essential in compre- hending tabular data, and this aspect has been largely overlooked in the existing literature. Our approach ad- dresses these issues by using explicit representations of the navigational semantics of the documents and using a domain-specific language to query the semantic repre- sentation and derive navigation strategies. Navigational knowledge is explicitly generated and associated to the tabular and multi-dimensional HTML structures of docu- ments. This semantic representation provides to the blind user an abstract representation of the layout of the docu- ment; the user is then allowed to issue commands from the domain-specific language to access and traverse the document according to its abstract layout. Keywords: Non-visual Web – Universal accessibility – Domain-specific languages 1 Introduction Accessibility to resources in our information-based soci- ety is considered an entitlement by the majority of the industrialized world. Through a reasonably simple series of keystrokes and mouse clicks, we are able to access vast amounts of information from distant parts of the world. While the value of all the information available is some- times questionable, it is there and it is accessible. Acces- sible that is, if you are not one of the many handicapped members of our society. For many in this group, such ac- cess is difficult or impossible. In the United States, land- mark legislation has mandated equal access to the infor- mation world for all Americans. Through laws such as the 1990 Americans with Disabilities Act (ADA), the 1996 Telecommunications Act and the Section 508 Amend- ment to the 1993 Rehabilitation Act, disabled people are guaranteed accessibility “where reasonably achievable.” These three words belie the problem. In the domains of hardware or architectural modifications, the problems, for the most part, are understandable and achievable. In the domain of computer software, however, the situation is somewhat different. The areas of operating-system in- terfaces and Web access highlight this problem. In the current work, we have focused on the accessibil- ity of the World Wide Web by blind and severelyvisually impaired people, both in general and more specifically in the educational setting. In this work, we face two ba- sic problem areas: the accessibility offered by the popular Web browsers and the effect of the design of the actual Web pages. In the former, we are not interested in making changes to the Web browser’s code, while in the latter, we are constrained to making accessibility recommendations to Web page designers. Additionally, we focus on the de- livery of Web content to non-visual users in aural format, specifically through speech synthesis. Given these constraints, our project has identified three of the most difficult areas of Web access through current browsers: tables, frames and forms. While other groups are working on the more general problem areas of Web browsing [2,21,22], we have confined our research to these three aspects of the problem, because they are the most difficult to solve in a general way. The general reading of a Web page by the visually impaired can be implemented by presenting a totally text-based version of the page, while this is not possible in the areas of tables, frames and forms. Text is linear in nature, while tables, frames, and forms are multi-dimensional and their layout is an inherent part of their semantic content.

Transcript of Intelligent non-visual navigation of complex HTML structures

Univ Access Inf Soc (2002) 2: 56–69 / Digital Object Identifier (DOI) 10.1007/s10209-002-0036-4

Intelligentnon-visual navigationof complexHTML structures

E. Pontelli1, D. Gillan2, G. Gupta3, A. Karshmer4, E. Saad1, W. Xiong1

1Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA; E-mail: [email protected] of Psychology, New Mexico State University, Las Cruces, NM 88003, USA; E-mail: [email protected] of Computer Science, University of Texas at Dallas, Richardson, TX, USA; E-mail: [email protected] of Information Technology, University of South Florida, Lakeland, FL, USA; E-mail: [email protected]

Published online: 6 November 2002 – Springer-Verlag 2002

Abstract. This paper provides an overview of a projectaimed at using knowledge-based technology to improveaccessibility of the Web for visually impaired users. Thefocus is on the multi-dimensional components of Webpages (tables and frames); our cognitive studies demon-strate that spatial information is essential in compre-hending tabular data, and this aspect has been largelyoverlooked in the existing literature. Our approach ad-dresses these issues by using explicit representations ofthe navigational semantics of the documents and usinga domain-specific language to query the semantic repre-sentation and derive navigation strategies. Navigationalknowledge is explicitly generated and associated to thetabular and multi-dimensional HTML structures of docu-ments. This semantic representation provides to the blinduser an abstract representation of the layout of the docu-ment; the user is then allowed to issue commands fromthe domain-specific language to access and traverse thedocument according to its abstract layout.

Keywords: Non-visual Web – Universal accessibility –Domain-specific languages

1 Introduction

Accessibility to resources in our information-based soci-ety is considered an entitlement by the majority of theindustrialized world. Through a reasonably simple seriesof keystrokes and mouse clicks, we are able to access vastamounts of information from distant parts of the world.While the value of all the information available is some-times questionable, it is there and it is accessible. Acces-sible that is, if you are not one of the many handicappedmembers of our society. For many in this group, such ac-cess is difficult or impossible. In the United States, land-

mark legislation has mandated equal access to the infor-mation world for all Americans. Through laws such as the1990 Americans with Disabilities Act (ADA), the 1996Telecommunications Act and the Section 508 Amend-ment to the 1993 Rehabilitation Act, disabled people areguaranteed accessibility “where reasonably achievable.”These three words belie the problem. In the domains ofhardware or architectural modifications, the problems,for the most part, are understandable and achievable. Inthe domain of computer software, however, the situationis somewhat different. The areas of operating-system in-terfaces and Web access highlight this problem.In the current work, we have focused on the accessibil-

ity of the World Wide Web by blind and severely visuallyimpaired people, both in general and more specificallyin the educational setting. In this work, we face two ba-sic problem areas: the accessibility offered by the popularWeb browsers and the effect of the design of the actualWeb pages. In the former, we are not interested in makingchanges to the Web browser’s code, while in the latter, weare constrained to making accessibility recommendationsto Web page designers. Additionally, we focus on the de-livery of Web content to non-visual users in aural format,specifically through speech synthesis.Given these constraints, our project has identified

three of the most difficult areas of Web access throughcurrent browsers: tables, frames and forms. While othergroups are working on the more general problem areas ofWeb browsing [2, 21, 22], we have confined our researchto these three aspects of the problem, because they arethe most difficult to solve in a general way. The generalreading of a Web page by the visually impaired can beimplemented by presenting a totally text-based version ofthe page, while this is not possible in the areas of tables,frames and forms. Text is linear in nature, while tables,frames, and forms are multi-dimensional and their layoutis an inherent part of their semantic content.

E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures 57

The problem is broad in scope for the visually im-paired. Lack of access to these structures has a negativeimpact not only on everyday-life decision-making, butalso on access to educational materials that are now beingpresented in larger measure via the Web.

2 Project overview

The focus of this work is on developing a collection oftools aimed at improving accessibility of a well-definedset of complex information structures commonly foundon the Web. The focus is on non-visual accessibility –to provide support to visually impaired individuals aswell as users accessing the Web with devices with no orlimited display capabilities – and the research is currentlyaimed at analyzing the role of HTML tables and frames.These two constructs are widely used in virtually everyWeb page (indeed the two constructs are often used inter-changeably) and they are by nature multi-dimensional .This implies that a linear translation of these componentsinto speech would lead to a substantial loss of semanticcontent – as illustrated by the cognitive studies conductedas part of this project and described in the next section.This is particularly evident in the case of tables. A table isinherently a multi-dimensional structure, and its spatiallayout is an essential component of the semantics encodedin it. The desired output is represented by tools that canassist blind users in navigating such complex data organi-zations producing aural output. Our work has lead to thedesign and implementation of a system providing user-directed and semantic-based navigation of tables, frames,and other non-linear data structures [13, 18]. This paperprovides an overview of the results accomplished in thisproject to this date. The paper is organized as follows.We first present an overview of the cognitive studies per-formed; these studies have highlighted the importance ofproviding users with an adequate representation of thespatial nature of tabular information. We next illustratethe structure of the system we have designed, focusingon the use of knowledge-representation techniques (con-ceptual structures) and domain-specific languages to sup-port user-directed navigation of tables and other complexHTML structures.

3 Related work

A number of proposals have recently been made towardthe development of tools to improve Web accessibility forvisually impaired people. Official government acts (suchas the ADA and the Telecommunications Act) have im-posed the adoption of design practices to makeWeb pagesuniversally accessible, and initiatives such as the W3CWeb Accessibility Guidelines [21] have tried to put thesedirections in practice. In spite of this, a substantial partof the Web is still not designed according to these crite-ria – even popular Web design tools do not enforce such

guidelines. For this reason, in recent years a large numberof proposals have emerged, aimed at providing accessibil-ity at the level of the final user , providing the final userwith tools to improve accessibility of existing documents.This solution is orthogonal to the application of designguidelines.The initial proposals based on the use of standard

screen readers (such as Jaws) in association with tradi-tional Web browsers had limited success – information onthe Web is highly context sensitive and highly structured,and only ad-hoc screen readers are capable of taking ad-vantage of this context-sensitivity and adapt their behav-ior to the structure of the document; even fewer tools arecapable of modifying their behavior to respond also tothe desires and capabilities of the end-users. The mostnotable outcome of this thread of research is representedby the design of aural Web browsers; in these tools, thescreen reader is an integral part of a browser and obtainsdirect access to the internal structure of the documentsrecovered from the Web. The knowledge of the structureof the documents is employed to drive the generation ofan accessible version of the document in the followingways:

• Provide a top-level presentation of the document’s titleand headings [14]

• Create aural summaries of the document [22]• Restructure the document to place emphasis on itsmain components [2]

• Use sound cues to identify the different components ofthe document [11]

TheWeb Accessibility for Blind (WAB) effort at ETHZurich [14] represents one of the first efforts in this area,where transcoding proxy servers are employed to extendHTML documents with (static) navigation information– specifically links to facilitate the retrieval of titles andhyperlinks. Substantial effort has been invested in the IB-M’s HomePage Reader (HPR) [2] and the related workby Asakawa et al. [1, 16]. This is indeed the only otherproposal that explicitly deals with the issue of table nav-igation. Similarly to HPR, the pwWebSpeak browser [4]also offers speech output through HTML interpretation.The reported version of pwWebSpeak does not supportframes; it represents tables as linear sequences of links(each table location is represented as a separate page),and tables are sequentially navigated left-to-right, top-to-bottom. The BrookesTalk [22] speech-enabled browserprovides access to different parts of a Web page usingfunction keys; its novelty is in the use of natural-languagesummarization to facilitate non-visual navigation.Another line of related work is represented by the ef-

forts in providing sound and haptic access to data ingraphical form (e.g., business graphics); relevant efforts inthis area have been proposed by Brewster et al. [19] andKurze [15].In this project our aim is to provide integrated client-

side (at the browser level) and server-side (at the level of

58 E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures

the Web-page author or provider) semantic specificationsfor the understanding and navigation of complex HTMLstructures (frames and tables). Relatively few other pro-posals have tackled this problem and most of them relyexclusively on the linearization of the row-column HTMLstructure to provide navigation. The need for adapta-tion of browsers and direct access to the HTML struc-ture for accessibility support has been raised by variousauthors [3, 10]. This was further underscored by recentsurvey studies that explicitly pointed out the ineffectiveperformance of existing screen readers when coupled withthe Web [5, 8]. Gunderson andMendelson specifically citethe task of locating information in table structures asone of the most challenging tasks for visually impairedusers [8].

4 Cognitive aspects of table navigation

Our first step in designing tools for the navigation of ta-bles and frames is to study how individuals typically reada table. To study the cognitive processes used in read-ing tables, we began by conducting an archival study inwhich we examined the structure of tables in selectedscientific journals (Science and Nature) and Web pages(from government and commercial sites), as well as thetasks for which the tables were designed. For both me-dia, the structure of the table was a function of the diffi-culty of the user’s projected task: simple tasks (such as anoverview of the data in a table) were typically associatedwith simple tables (such as the row-by-column matrix),whereas the more complex tasks (such as understandingan interaction between variables) were associated withtables in which the rows or columns contained spatially-defined groups. In addition, Web tables typically con-tained graphics and hyperlinks. This archival study in-dicated that two types of spatial organization are usedto provide organizational information in tables: the two-dimensional row-by-column layout and spatial grouping.Following from the results of the archival study, we

conducted two series of experiments to examine differ-ent means for supplying a table reader with informationabout the structure of a table via non-spatial means. Inboth series, we used sighted participants in an attemptto understand the basic cognitive mechanisms used bysighted readers interacting with visually presented in-formation structured in a table format. In some of the

Table 1. An example of a table (on pesticides) used to study table reading

Type Formulation Target Toxicity Exposure

Vanquish herbicide flowable dandelions medium dermalZapper fungicide powder mildew low ocularRiddance insecticide fogger cockroaches high inhalationBeGone rodenticide granules mice extreme ingestionPurge miticide dust ticks nominal cutaneous

experiments, we attempted to simulate the flow of in-formation that blind users would experience when theyinteracted with a table by restricting the visually avail-able information to one item at a time. The goals thatunderlie this approach are to provide a baseline againstwhich later to compare the cognitive processes of blindusers and to understand the information that spatial cuesprovide a sighted user and possible ways of providing thatsame information by non-spatial and non-visual means.The participants were from the participant pool at NewMexico State University, which is representative of thelarger student body at the university. Consequently, theparticipants included both males and females ranging inage from 18 to 50, with most of the participants fallingbetween 18 and 23.In one series of experiments, we investigated the use of

tables to learn structured information, much like a chem-istry student might learn the periodic table. The experi-ments were conducted using 5!5 tables, and participantswere instructed to learn the information (i.e., a word) ineach of the 25 cells. For example, in Table 1 each tar-get word is associated with an individual combination ofrow and column headers. However, to simulate the flowof information experienced by a blind reader, each cellwas presented individually; accordingly, during the learn-ing phase in each experiment, groups comprising the rowheader, column header and a unique target word werepresented to the participant. This type of presentationwas called the temporal format . Depending on the condi-tion, the temporal format might be represented by a spe-cific cue designed to provide structural information. Par-ticipants did not see the underlying table at any timeduring the experiment; nor were they told that the infor-mation was extracted from a tabular display (except as anexperimental condition in one experiment). Following thelearning phase, participants received a test in which theyhad to select, for each row-column combination, the cor-rect target word from a list consisting of the correct targetword, the remaining target words from the row and col-umn and six foils (for a total of fifteen possible responses).The first experiment examined the effect of the tem-

poral presentation versus the spatial presentation of tab-ular information. Participants studied and were tested onthree tables that differed by the information containedand the type of cue displayed with the row header, col-umn header and target word. The temporal-only con-dition (n = 24) consisted of only the row and column

E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures 59

headers with the target word. The spatial-separatedcondition (n = 24) also showed an empty 5!5 table inwhich the relevant position where the target word wouldnormally be located in a table was highlighted. Likewise,the spatial-integrated condition (n = 24) showed thissame table, and the indicator cell contained the targetword. During the subsequent test, participants from thespatial-integrated condition made the fewest errors. Thedata suggests that providing spatial cues results in bet-ter learning than the pure temporal presentation of thetabular information.A second experiment examined whether non-spatial

cues could also produce better learning than the temporalpresentation format. Experiment 2 used the same basicprocedures as experiment 1, in that participants receivedtraining with the temporal format and two conditions inwhich the temporal format was enhanced with anothercue. The two non-spatial enhancements in the learningphase involved the use of color or auditory tones. In thecolor condition (n= 24), each column in the table was as-signed a distinct hue, with rows represented by an amountof saturation. In the tone condition (n = 24), each col-umn was assigned a discrete degree of auditory balanceof sound-pressure level in the left and right ears. Thismanipulation of the sound intensity between the ears re-sulted in the perception of column 1 as being to the farleft, column 3 directly in front, and column 5 to the farright of the participant; column 2 was perceived betweencolumns 1 and 3, whereas column 4 was perceived be-tween columns 3 and 5. In other words, the five columnswere presented in an auditory spatial structure along thehorizontal dimension. Row information was indicated bychanges in pitch, with higher pitch indicating higher rows.In this tone condition, participants wore Sony DynamicStereo headphones to ensure the desired manipulation ofthe intensities and pitches to the two ears. In addition,a third condition (n= 24) was identical to the temporal-only condition in the first experiment. Contrary to ourhypothesis, the results showed that providing color orauditory cues to the structure of the tabular informa-tion resulted in worse test performance than the temporalformat.As a consequence of the surprising finding that adding

structural information led to worse performance in thesecond experiment, a third experiment was designed toexplore the hypothesis that the auditory and color cuesdisrupted learning because participants did not recog-nize that the cues provided structural information sincethey had not been informed that the target informa-tion could be organized as a table. Accordingly in thethird experiment, participants were randomly assignedto one of three groups: The no-instruction group (n =24) received no additional information about the under-lying tabular structure. In the structure condition (n= 24), participants were told that the data was orga-nized in a systematic fashion. The table group (n = 24)was told that the data was organized in a systematic

fashion and were also presented with an example tableand explicitly shown how the keywords (column headerand row header) were related to the target word. Ad-ditionally, the tone cues were paired with the exampletable and participants interacted with the table to ex-plore the relationship between the tones and spatial lo-cations of the target words. For example, if a partici-pant clicked on each row in a single column of the ex-ample table, a change in pitch at a constant balancelevel would be heard progressively down the column.Again the results were contrary to our hypothesis. Par-ticipants in the table condition, who were explicitly in-formed of the underlying structure and shown an ex-ample, produced significantly more errors than either theno-instruction or structure conditions. No significantdifferences in recall performance were detected betweenthese latter conditions.Thus, these results show a disadvantage for table users

when they are given information in a procedure that sim-ulates the flow of information to blind users using a screenreader. The results also show that providing a table userwith additional structural cues was helpful only when thecues were spatial in nature; adding color or auditory cuesmade learning performance worse.The second series of studies examined the use of spa-

tial and non-spatial forms for indicating groups withina row-by-column matrix table. Thirty participants usedtables to answer a variety of questions that varied in com-plexity. Across the experiments, the participants’ per-formance were compared when they used a table orga-nized simply by alphabetic listing (baseline), with group-ings of table rows indicated by (1) spatial proximity onthe display, (2) color-coding, (3) participant-controlledgraying-out of the group not selected by the participant,and (4) exploding-out of the selected grouping (that is,the rows of the selected group expand in size and appearto move closer to the reader). The results showed thatthe various methods used to cue grouping informationonly aided readers answering complex questions. Spatialgrouping, color-coding, graying-out, and exploding-outall produced better performance on selected questionsthan the alphabetical organization, although exploding-out was the least effective. Interestingly, when the spatialgrouping method was combined with any of the othermethods, the combination fared no better than the cuestaken one at a time. We have proposed that these find-ings suggest that these forms of grouping cues permit thetable reader to shift procedures on complex questions tothe procedures that they use in answering simple ques-tions. As a consequence, their performance improves onlyon those complex questions.These archival studies and experiments have helped

us to understand how important spatial cues are to theconstruction of tables. Replacing spatial cues with non-spatial ones can confuse a table reader, even when it isunderstood that the cues provide organizational informa-tion. Work on non-visual representation of interrelated

60 E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures

objects has also confirmed that audio representations(sound cues) alone without interrelational representationsignificantly lower the mental-perception level of blindpeople [12].These results motivate our efforts toward making use

of user interaction and semantic-based navigation insteadof automatic introduction of fixed sets of auditory cues,as done in related proposals. These results also providea fundamental motivation to our work by highlightingthe inherent complexity of table reading (even for sightedusers) and the intrinsic dependence of content learningon spatial layout. The lessons learned from these experi-ments have been valuable in designing the tools describedin the rest of the paper and have provided us with valu-able evaluation criteria.

5 System structure

Figure 1 provides an overview of the proposed systemthat we are developing for making complex HTML struc-tures (e.g., tables) accessible. The system is composed oftwo subsystems. The first subsystem deals with the re-trieval of documents from the Web, as directed by visualusers (e.g., a teacher preparing course material) or byvisually impaired users during Web navigation. Each re-trieved document is properly filtered and analyzed, lead-ing to the generation of a semantic description (SD),which is then stored in a local database. The SD isaimed at explicitly representing the desired spatial lay-out for the components of the document, and supportsthe user-driven traversal of such layout. Thus, the SDeffectively becomes a roadmap that the blind user caninteractively follow to navigate the document’s compo-nents. As explained later, SDs can be generated eithermanually through a dedicated graphical user interface

Fig. 1. Overall system structure

(GUI) or automatically through syntactic and seman-tic analyzers. The second phase involves the actual non-visual navigation, where requests (navigational goals)from a blind user are paired with the available SDs tosupport interactive and intelligent non-visual navigation.This second phase is conducted with the help of an auralbrowser . Navigational goals can range from simple in-teractive traversals of the multi-dimensional structure tomore sophisticated goal-oriented search problems (suchas locating items of the structure according to a given setof properties). A domain-specific language (DSL) is intro-duced to support interaction between blind users and theaural browser.In the successive sections, we describe in more de-

tail the structure of the SDs used in this project and thedomain-specific language used to support navigation.

6 Semantic descriptions

One of the key aspects of the solution we propose inthis project is the adoption of knowledge-representationmechanisms, specifically conceptual graphs [20], to cap-ture the navigation structure of a complex HTML com-ponent. We will refer to the structure used to encodethe navigation structure of a complex HTML compon-ent as its navigational SD (or simply semantic description(SD)). The purpose of the SD is to explicitly describethe desired spatial layout of the HTML component (e.g.,a table). The SD is synthesized in a semi-automatic fash-ion (as described later). The SDs are used to (interac-tively) guide the process of navigating the document’scomponent, through the use of an aural navigator. Ulti-mately, the use of SDs allows one to customize the navi-gation process according to the semantic structure of eachindividual document.

E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures 61

In the rest of this section, we describe in detail thestructure of our SDs and how SDs are obtained in the sys-tem we are developing.

6.1 Conceptual graphs

The knowledge representation scheme that we propose toadopt in this project is based on conceptual graphs. Con-ceptual graphs and their associated theory, conceptualstructure theory, were proposed in the 1970s as a way ofdrawing logical statements in diagrammatic form ratherthan in a linear text-based calculus. The basic ontology isvery simple, as it is in mathematical logic. A conceptualgraph can have two kinds of node; a concept node, whichrepresents types and objects of that type, and a relationnode, which represents a relationship between these ob-jects. The theory allows for a basic expressiveness that isequivalent to first-order logic as well as mechanisms fordefining concepts and for representing type hierarchiesof concepts and relations. Researchers have extended theformalism to allow representation and manipulation ofmore advanced ontologies, especially those involving ac-tions and events, and higher-level structures such as view-points, and nested contexts.There are two kinds of structure present in any Web

page: syntactic and semantic. The former is hierarch-ical, as reflected in the use of HTML/XML. Before XMLwas proposed, the only way to navigate a Web page wasto follow this syntactic structure. Any other navigationtechnique was forced to rely on search through content,especially if no meta-data was present. With the adventof XML, meta-data representation has become far easier,since a Web page can now store arbitrary amounts of ad-ditional information in addition to its main content. AnXSL style sheet (or other XML processor) can choose toignore this extra content when displaying the document.It has been suggested that this extra meta-data contentcan be used for Web-page retrieval; it can also clearly be

Fig. 2. Creation of semantic descriptions

used for Web-page navigation. It is this latter use that weare proposing as semantic navigation. Since the contentof this meta-data is arbitrary, links between meta-dataitems and between meta-data and the main content of thepage can cut across the syntactic structure of the page.These links have to be represented separately from thepage itself, possibly in an associated document, just asdisplay information can be separated from the page andplaced in a style sheet. We are proposing to use concep-tual graph formalisms for the representation of the se-mantic links.The creation of the SD for a multi-dimensional com-

ponent of a document (e.g., a table) represents the keyproblem we propose to solve in this project. The processinvolves two steps: (i) identification of the concept nodesof the conceptual graph, and (ii) identification of the re-lation nodes of the conceptual graph. In our context, theconcept nodes of the graph represent the semantic entitiesthat are described by the document’s component, eitherthe individual cells of a table or semantically meaningfulgroups of cells. Nodes are commonly organized accord-ing to one or more hierarchies. The lower level of eachhierarchy commonly includes syntactic elements directlyextracted from the document (such as cells of a table orpanes of a frame structure). The higher levels of the hi-erarchies provide semantic entities representing generalconcepts or collections of concepts (such as viewing a col-umn as a collection of cells).The edges of the conceptual graphs represent relation-

ships between the conceptual entities identified as nodesof the graphs. A natural class of relationships originatesfrom the presence of a hierarchy between different con-cepts present in the graph, such as the natural inclusionrelationship between collections of cells (a cell belongs toa row that belongs to given group of rows, and so on).The conceptual graph representing a document compon-ent (e.g., an HTML table) will be created by combiningthree sources of knowledge: (i) syntactic content of the

62 E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures

document (e.g., use of HTML tags and attributes); (ii)direct input from a human annotator (e.g., the teacher,the creator of the document, a third party); and (iii) his-tory of how the documents’ components have been usedin the past (through databases of access traces). Thesemethods are analyzed in more detail in the next threesubsections (and summarized in Fig. 2). It is important toobserve that, in our framework, the knowledge represen-tation is not static, rather it is highly dynamic and can beupdated at any time.

6.2 Explicit generation of SDs

It is fairly clear that reliable purely automatic generationof SDs for every complex HTML structure (CHS) is an im-possible task [20]; for example, the syntax used to expressCHSs is inadequate to explicitly express the completenavigation strategies for such document component. Inorder to make the task of constructing the SD more real-istic, we restricted our focus to the context of courseworkmanagement ; that is, the various documents are part ofthe material offered to the students of a course. Thereare some clear advantages in taking such perspective: (i)we can assume the presence of an instructor, who willtake charge of generating and/or completing the semanticdescriptions wherever they are lacking or incorrect; and(ii) we can assume the presence of a controlled popula-tion of users, whose actions can be, to a certain extent,supervised.The first and simplest approach toward the construc-

tion of the SD of a complex structure is the manual ap-proach: a human annotator (e.g., an instructor or a TA)makes use of a specialized tool to create the descriptions

Fig. 3. Graphical user interface for creation of semantic descriptions

while assembling the course material. This task is accom-plished through the use of a specialized GUI. As shown inFig. 2, each request generated by the instructor/TA is fil-tered by a proxy server; each time the incoming documentcontains a CHS (in our tests, we have focused on HTMLtables), the proxy server automatically extracts the CHSand starts the annotation GUI. As shown in Fig. 3, theGUI presents the user with an abstraction of the table(rows and columns, no content); the human annotatorcan select arbitrary groups of cells in the table and assignsto them a description – thus generating new abstractionlevels in the SD. The tool has been developed in Java andprovides the following capabilities: (i) specialized parsingroutines focused on the extraction of CHSs; and (ii) in-tegration of the syntactic analyzer (described next), thusoffering to the annotator an initial SD to work on.

6.3 Syntactic analysis of tables

The task of developing the SD for a complex HTMLstructure is facilitated by the introduction of a syntac-tic analyzer . The objective of the syntactic analyzer isto extract as much navigational information as possiblefrom the syntactic structure of the CHS. Once again, wehave focused on the analysis of HTML tables to assessthe feasibility of this phase. The intuition behind thissyntactic analysis is to recognize collections of table cells(i.e., HTML <TD> elements) that the syntactic layout ofthe table suggests should be grouped together. There area variety of different items in the syntactic layout of thetable that may suggest grouping of cells: for example, (i)the explicit grouping of cells and rows performed throughthe use of elements such as <COLGROUP> and attributes

E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures 63

such as COLSPAN; (ii) the use of indexing elements in thetable, such as header tags (<TH> in combination with thescope attribute, and <THEAD>); (iii) each cell may identifyone or more header elements (through the headers at-tribute in combination with the axis attribute); and (iv)in addition to the above components (that are all explicitHTML constructs to structure a table), Web authors fre-quently use additional visual features to suggest groupingof cells; the most typical approach is through the use ofdifferent background colors, fonts colors and size.We have developed a syntactic analyzer that explores

the use of these features to recognize possible meaning-ful grouping of cells. These grouping of cells are proposedto the manual annotator (through the GUI described inthe previous section); at that stage, the human annotatorhas the option to accept, modify, or reject such suggestedcollections.For example, consider the table on the left in Fig. 4;

the syntactic analyzer is capable of recognizing the com-ponents of the semantic structure illustrated on the rightin Fig. 4. The original table explicitly indicated (usingHTML 4.0 attributes) the cells Lodging and Meals asheaders (through the header attribute). The cells withthe dark backgroundwere also singled out by the analyzerand grouped together under a separate artificial header(called Gray cells in the figure). The nodes in the graphrepresent either concepts (e.g., the whole table, the con-tent of individual cells) or relations (header names, othergrouping of cells suggested by the layout of the table).The edges in the graph connect elements of the table ac-

Fig. 4. Result of syntactic analysis

cording to their layout. This simple graph representationprovides a clean layout of the table as composed by dif-ferent groups of columns (meals and lodging), themselvesgrouped according to the travel destination. In turn, theanalyzer has recognized an additional grouping of cells,containing the totals for each trip entry.

6.4 Usage-based synthesis of SDs

6.4.1 Pathfinder networks

In this phase of the project, we have employed pathfindernetworks to generate a model for Web navigation. The in-tuition is to maintain a collection of traces, where eachtrace contains the events generated during a previousnavigation session of the same document. Pathfinder net-works can be used to process these traces and generateclusters of table cells that can be translated into new lev-els of abstractions in the SD.Network models have been used in many areas in com-

puter science including artificial intelligence, operatingsystems, and databases. One of the methods that pro-duce these network models is Pathfinder (PFNET) [6].The pathfinder algorithm is an algorithm that finds theshortest path between every two entities in a weightedgraph and produces a network structure based on esti-mates of similarities between its entities. Distances can beobtained by estimating the similarity between each pairof entities.

64 E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures

The input to the pathfinder algorithm is representedby a collection of entities (represented as nodes ina graph) and estimates of the distance (or similarity)between certain pairs of nodes. The outcome is the gener-ation of a network where links between nodes are presentif the corresponding entities are considered sufficientlysimilar (and each link bears a weight measuring suchsimilarity). PFNET introduces a special metric calledMinkowski r-metric to compute the distance betweeneach pair of nodes that are not directly linked. The r-metric is the rth root of the sum of the rth power of eachlink weight between the considered nodes. For r = 1, the

Fig. 5. Network obtained from PFNET algorithm

path length is the sum of all weights in the path; for r= 2,the length is the computed Euclidean distance; and forr= infinity, the length is the maximum weight of any linkin the path.The generation of the network is based on a sec-

ond parameter called the q-parameter that specifies themaximum number of links in paths that are guaranteedto satisfy the triangle inequality. Intuitively, a path be-tween two nodes satisfies the triangle inequality if theweight of the link between the extremes of the path isno greater than the size of the path (according to ther-distance). Links are eliminated from PFNET because

E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures 65

they violate the triangle inequality of paths that haveq or fewer links. In the resulting PFNET, triangle in-equality violation could be found in paths that have morethan q links. Larger values of q result in fewer trian-gle inequality violations. There is no triangle inequalityviolation in the resulting PFNET when q is the num-ber of nodes" 1. The link membership rule (LMR) isused to determine whether or not links should be addedto the pathfinder network. This is accomplished by or-dering all links in the network in increasing order ac-cording to their weights. Direct links between a pairof nodes are included in the pathfinder network if andonly if their weight is less than the weight of any pathwith q or less links (triangle inequality) using the r-metric to compute path weights. In addition, pathfindernetwork uses the link labeling rule (LLR), which pro-vides a label for each link according to some classifica-tion scheme. Links are labeled as primary, secondary,or ternary depending on their role in the formation ofthe pathfinder network. A primary link is the link thatis the only path joining a connected subgraph of thePFNET to a subgraph of only one node. A secondarylink is the link that joins subgraphs in the PFNET orprovides alternate path between nodes already connectedby primary links. A link is labeled ternary if it joinsnodes within a subgraph for which alternate paths al-ready exist.

6.4.2 Application of pathfinder networks

In this project, we have encoded the pathfinder networkgeneration algorithm to support the maintenance of SDs.A specialized tool has been developed to offer the oppor-tunity to sighted and blind Web users to generate tracesof the steps taken in navigating a document. Traces are inturn translated in inputs to the PFNET algorithm; nodescorresponds to table cells and/or known clusters of cells,while initial measure of similarities are derived from thedistance between nodes in the traces. The resulting net-work is used to detect new clusters of cells, nodes linkedby primary edges.We have performed a variety of experiments to vali-

date the effectiveness of the usage-based synthesis of se-mantic representations via pathfinder networks. The ex-ample in Fig. 5 has been obtained by processing one ofthe tables from the Ohio State University weather server;this home page provides marine, NCEP, and satellite tex-tual weather data. The data is organized in tables; eachcell in a table is a hyperlink to another home page. Ourtarget is to navigate through these tables to producea number of navigation traces, and then use such tracesto create a network model (PFNET) from the hyperlinksaccessed during the navigation. We have worked on theAlaskan marine data table. This table has 15 hyperlinks;these hyperlinks represent the nodes in PFNET. We haveused a number of navigation sequences (traces) to builda PFNET (14 navigation sequences). We have used the

values r = 2 and q = 14 as parameters1. The distances be-tween nodes are measured according to how close cellsappear in a navigation sequence. This network model ob-tained from PFNET can then be used to automaticallyadd new clusters to the SD of the table. For example, thefigure shows a significant clustering around the “EasternGulf Coast” cell – the cells reached from this one can beclustered into a single group for navigation purposes. Thisallows us to navigate through the nodes (links) level-wise,depth-wise, or both. This way of navigation is better thantraditional row- or column-wise navigation because it isbased on navigating the most important (frequently vis-ited) hyperlinks first, which may lead the user to targetinformation much faster. A final post-processing phaseis used to integrate the PFNET obtained into the mainSD; groups of nodes clustered by PFNET are translatedinto new levels of abstraction in the SD. Informal eval-uations conducted have demonstrated that PFNET gen-erates clusters of nodes of good quality, especially in thepresence of a large number of traces.

7 Domain-specific languages for table navigation

In [18] we presented for the first time a domain-specificlanguage designed to support navigation of complexHTML structures. This solution implies that each stepin the navigation of a CHS is reduced to the executionof a sequence of commands in a specialized language.There are various advantages in following this approach;for example, one could predefine (parts of) navigationstrategies and provide clean interface to support differ-ent forms of disabilities (such as partial blindness). TheSD can be explicitly generated through a sequence of DSLcommands (thus allowing for its dynamic manipulation).This ability allows the dynamic reconfiguration of the SD,to allow users to “keep their finger on the table” dur-ing navigation. It also reduces the management of theSD to an explicit procedure that can be reused and pa-rameterized to accommodate for orthogonal changes indocuments’ content (e.g., dynamic Web pages). In turn,the DSL commands that generate the SD can be auto-matically generated by the syntactic and usage-basedanalyzers. The effect of each navigation command is toproduce an aural response. As long as the navigation re-mains within the layout of the table (non-leaf elementsof the conceptual graph), the aural response is a descrip-tion of the element of the conceptual graph (typicallyobtained from the header elements of the table). When-ever the navigation reaches the leaf elements of the SD –a cell of the table – a basic HTML aural reader is invokedto reproduce the content of the cell.The key commands used to describe the semantic

structure are: (i) connected(Node,Node,Description)

1 Parameters suggested for this problem by D. Dearholt, one ofthe original designers of Pathfinder.

66 E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures

stating that the two nodes are part of two layers and thedescription is the conceptual definition of their relation;and (ii) content(Node) describing the content of a givennode (e.g., the HTML stored in a table cell). Navigation isperformed through standard graph-traversal commands:(i) concrete(Concept) and abstract(Concept), whichallow one to move between levels in the semantic repre-sentation (e.g., move from a group of rows to a specificrow); and (ii) previous and next to move within a con-ceptual layer (e.g., move between cells in a column).For example, the construction of the SD for a table

representing some travel expenses will be obtained by ex-ecuting commands of the type:

group(expenses, [san-jose, seattle])group(san-jose, [25august])group(seattle, [27august,28august])group(25august, [8-25meal,8-25hotel,

8-25transport])group(27august, [8-27meal,8-27hotel,

8-27transport])group(28august, [8-28meal,8-28hotel,

8-28transport])group(8-25meal, [37.24])....

Some additional commands [18] are also present toquery the current status of the navigation process – suchas remembering the cells that have already been explored.A sample sequence of commands that allows the user

to access a table representing travel expenses and obtainthe travel, lodging, and meal expenses for the a trip toSeattle on a given date will look as follows:

// Sample Navigationconcrete(seattle)concrete(28august)concrete(8-28meal)abstractconcrete(8-28hotel)abstractconcrete(8-28transport)

The response to each command is an aural messagedescribing the current location of the user in the struc-ture; for example, after executing concrete(seattle)the aural browser will speak out the dates of the trav-els to Seattle, allowing the user to select the next com-mand, such as concrete(28august) that will access theexpenses of the travel to Seattle conducted on the 28August.

8 DSL for frame navigation

Frames-based Web pages are yet another source of diffi-culty for blind users. Most screen readers will traverse theframes in a frame-based Web page in an arbitrary order.Quite often, the author of the frame-based page intendedthe frames to be viewed in some specific order. The in-

formation regarding the order in which the frames are tobe viewed is implicit in contents of the frames and hasto be inferred through visual inspection. This informationregarding the order is lost to the screen-reader and thusthe screen-reader might read out the frames in the wrongorder. For example, the most common usage of frames isfor displaying an index frame and a content frame, wherethe content of the various items in the index can be seenin the content frame by clicking on them; in this situationa rational screen reader should read the index page first,then the contents page, and then go back to the indexpage.For making frame-based pages accessible, we use an

approach similar to the one used for making tables ac-cessible to blind users, described in the previous section.A DSL (which is essentially a subset of the one describedearlier for tables) is designed to allow navigation of frame-based pages. A frame-based page is divided into a numberof frames. Our DSL treats these frames as a set and allowscreation of a conceptual graph representing the layoutof the different frames. We are assuming that the graphfor a frame page is hierarchical; the levels in this graphrepresent groups of frames. The DSL also provides com-mands for reification and abstraction to allow one to movebetween levels in such hierarchy; thus, navigation is ef-fectively viewed as the user-directed traversal of a treestructure, where the leaves represent the content of theframes. A simplified grammar for this DSL is given below:

Program ::= Declaration ; Command .Declaration ::= Declaration ; Declaration |

url Name = URLCommand ::= Command ; Command |

group (Name, Namelist) |abstractto(FromGName, ToGName) |

goto(Name) |abstract | concrete (Name)

The url command allows associating symbolic names(URL variables) to specific URLs. The goto(Name) state-ment can direct the browsing sequence to any declaredURL or group. The Name can be either a URL variablename or a group name. The current position and cur-rent level (in the abstraction hierarchy) is updated toName. The abstract statement directs the browsing se-quence to the next higher level (if one exists) in theconceptual hierarchy. The current level and current pos-ition in the hierarchy are updated as follows: if an up-per level does not exist, the current level and positionare unchanged. Otherwise, if the current level is a URLvariable, the new current level is set to the group thatcontains this URL, and the new current position is setto the last element in the list of URLs associated withthe group. If the current level is a group name, the newcurrent level is set to the group name of the upper level,and the new current position is set to the last element inthe list of URLs associated with the upper level group.The concrete(Name) command directs the browsing se-

E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures 67

quence to a lower conceptual level (closer to the actualframe panes), if one exists. The current level and currentposition are updated as follows: If a lower level does notexist, the current level and position are unchanged. Oth-erwise, the first element of the list associated with Nameis examined: if it’s a URL variable, the new current levelis set to it, and the new current position is set to theURL variable name. If it is a group name, the new cur-rent level is set to this group name, and the new currentposition is set to the first element in the list of names cor-responding to this group. The group(Name,Namelist)statement assigns a set of URL variables to a group.A group represents a new level in the browsing hier-archy. URL variable list, current position and browsingsequence remain unchanged due to this command. Theabstractto(FromGName,ToGName) statement associatesan upper (more abstract) level ToGName to FromGName.Applying the operation abstract(FromGName) will re-sult in control moving to the higher level (ToGName)in the hierarchy. The concreteto(FromGName,ToGName)statement defines a (lower) concrete level ToGName forthe level FromGName. The URL variable list and cur-rent position remain changed. Applying the operationconcrete(FromGName) will result in control moving toToGName.

9 From DSLs to action theories

To understand a table, even when familiar with its con-tent, a visually impaired user has no choice but examin-ing all of its cells. Any language that supports the pro-cess of understanding a table needs to be designed tohelp the users to navigate in the table, perhaps at differ-ent level of abstraction. It also needs to provide a sim-ple way for users to find some specific information, tolocate a specific cell with the content satisfying a crite-ria. We argue that all of these features can be supportedunder an action theory framework [7]. Action theorieshave been proposed in the artificial-intelligence commu-nity to allow the formal description of domains of inter-ests, in terms of properties that describe the domain andactions that an agent can perform to interact and modifythe domain. Under this perspective, a visually impaireduser could be viewed as an agent that tries to explorean environment. The agent has the capability of mov-ing around (moving a finger from location to location)and can comprehend what is in the cell2. Gradually, to-gether with general knowledge about the world, our agentwill arrive at conclusions that are useful. For instance,the conclusion would be drawn that the table in Fig. 4contains travel-expenses information and could, for ex-ample, be used to determine the total expenses for the LAtrip.

2 How this can be done is an interesting topic and deserves an in-depth study that is not the main topic of this paper.

In the action theory framework, the results of the nav-igation process will be described by a set of properties de-scribing the environment, called fluents. The commandsthat allow user to navigate between table cells are calledactions . Actions can be divided into two types, basic ac-tions and table-specialized actions. The former is similarto that of moving the finger around the table, thus pro-viding the basic navigation capabilities to users. Observethat the structure of the basic action is table indepen-dent. The latter is, in many cases, a carefully developedprocedure that helps users to understand the table with-out having to navigate through the whole table. Actionlanguages can be used to formulate goals that an agentintends to accomplish; planning technology can then beemployed to automatically derive the sequence of actionsneeded to accomplish the goal. In our context, this gener-alization provides the ability to carry out more complexnavigation tasks. For example:

1. It allows the user to describe the navigation objectiveas a goal and let automatic mechanisms (i.e., a plan-ner) develop (part of) the navigation process; for ex-ample, a query of the type? achieve located_at(Cell), type(Cell, city),

connected(City, Expense, lodging),content(Expense)# $150

will develop a navigation plan (on behalf of the user)that will lead to a cell (located_at) that containsa city (type) and where lodging expenses are morethan $150.

2. It allows the semantic description to predefine notonly complete navigation strategies but also partialskeletons, making the remaining part of the naviga-tion dependent on run-time factors, such as user’sgoals, specific aspects of the table’s content, and user’srun-time decisions. For example, the following actionconstraint can be added to Table 1 to force the userto access information about toxicity level immediatelyafter accessing a new pesticide:

always ( if located_at(Cell) and type(Cell,pesticide)

then next moveto(Cel1,toxicity) )

Details of this generalization have been presentedin [17].

10 Implementation details

System interfaces. The interfacing of the various sys-tem components with the outside world has been ac-complished using standard off-the-shelf interfaces. Thesystem interacts with the outside world at three differ-ent levels. First of all, the interactions with the externalservers are accomplished via transcoding proxy servers,as illustrated in the previous sections. We have currentlyimplemented two different proxy servers: the first handlesthe annotation process; that is, it interacts with a stan-

68 E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures

dardWeb browser (Netscape) and redirects the incomingdocuments to the annotation GUI and the syntactic an-alyzer. The transcoding proxy server also enriches thereceived documents with calls to a tracer applet; thisallows the Web browser to offer the user the ability totrace access to the tables’ content. These traces are thenused by the pathfinder algorithms to improve the SD.This proxy server has been built from scratch, as a sim-ple C program interacting with external Java programs(the annotation GUI and the syntax analyzer). The sec-ond proxy server is employed by the speech navigator toretrieve each document accessed by the non-visual userand to retrieve (if present) the SD associated to the CHSpresent in such document. In this second proxy server,we have experimented with IBM’sWBI Development Kit(Web Based Intermediary). WBI is programmed to act asa proxy and to serve as a HTTP request and response pro-cessor; a plug-in (a module programmed into WBI) hasbeen developed to interface the incoming HTTP requestsand the incoming documents with the SDs in the localrepositories.Speech synthesis. Speech synthesis is currently pro-duced using IBM ViaVoice Runtime and Software Devel-opment Kit for Text-to-speech tools.Implementation of DSLs. The development of theDSL relies on the use of advanced logic-programmingtechnology. The specification of the DSL has been ac-complished through the use ofHorn-logic denotations [9],a software-engineering methodology based on logic pro-gramming to specify domain-specific languages and auto-matically derive provably correct implementations. TheDSL inference engine is also currently being extended totake advantage of the powerful capabilities offered by theunderlying logic-programming layer. As discussed earlier,the DSL can be seen as an instance of an action lan-guage; this allows the use of sophisticated logic-basedplanning and reasoning algorithms to (partially) auto-mate the navigation process and to delegate to softwareagents some of the more tedious tasks.

11 Conclusions and future work

This paper provides an overview of a project conductedover the last two years with the objective of improv-ing Web accessibility for visually impaired individuals.The proposed approach is based on the idea of deriv-ing – either automatically or with the help of a humanannotator – a semantic description of the more com-plex components of a document (e.g., tables, frames),and using such description to guide the navigation ofthe document. We proposed different ways to derive andmaintain the semantic representations. We have also de-veloped a domain-specific language, which allows the userto query the semantic representation and derive naviga-tion strategies. The language has been validated in twoseparate contexts: the navigation of tables and the navi-

gation of frames. The different components have been in-terfaced and integrated, and the complete system is nowbecoming operational.We are currently experimenting with adapting the

same technology for the interactive navigation of XMLfragments. XML provides a clean syntactic organizationof documents (hierarchical structure), but often the syn-tactic organization does not reflect the desired naviga-tional behavior (as in the same way we need to resort toXSLT to provide adequate visual display). We are cur-rently exploring methodologies to develop a navigationalSD from the information contained in XSLT sheets asso-ciated to the XML document. The actual navigation canthen be realized adopting the same type of DSL languageas that used for tables and frames.We plan to perform extensive usability studies to in-

vestigate the effectiveness of the proposed techniques inaccessing multi-dimensional data organization in a non-visual fashion. Preliminary experiments with sightedusers are in progress and experiments with blind users arein the planning stage.

Acknowledgements. The work has been partially supported by NSFgrants HRD-9906130, CCR-9875279, EIA-0130887, EIA-9810732,and by a DoEd NIDRR grant. The authors would like to thankS. Pazuchanics and E. Pennington for the e!ort invested in devel-oping the cognitive component of this project.

References

1. Asakawa C, Itoh T (1998) User interface of a home pagereader. In: Blattner MM, Karshmer AI (eds) Proceedings ofthe third international ACM conference on assistive technolo-gies. ACM Press, New York

2. Asakawa C, Laws C (1998) Home page reader: IBM’s talkingweb browser. Technical report. IBM

3. Brewer J, Dardailler D, Vanderheiden G (1998) Toolkit forpromoting web accessibility. In: Proceedings of the technologyand persons with disabilities conference, Los Angeles, Calif.,17–23 March 1998

4. De Witt JC, Hakkinen MT (1998) Surfing the web withpwWebSpeak. In: Proceedings of the technology and personswith disabilities conference, Los Angeles, Calif., 17–23 March1998

5. Earl C, Leventhal J (1999) A survey of Windows screen readerusers: recent improvements in accessibility. J Vis ImpairmentBlindness 93(3):174–177

6. Fowler RH, Dearholt DW (1989) Pathfinder networks in infor-mation retrieval. Technical report MCCS-89-147. New MexicoState University

7. Gelfond M, Lifschitz V (1998) Action languages. Electr TransAI 3(16). http://www.ep.liu.se/ea/cis/1998/016/. Cited 11September 2002

8. Gunderson J, Mendelson R (1997) Usability of World WideWeb browsers by persons with visual impairments. In: Pro-ceedings of the RESNA annual conference, Pittsburgh, Penn.,20–24 June 1997. RESNA Press

9. Gupta G, Pontelli E (2002) Specification, implementation,and verification of domain specific languages. In: Computa-tional logic: from logic programming into the future. Springer,Berlin Heidelberg New York

10. Hendrix P, Birkmire M (1998) Adapting Web browsers for ac-cessibility. In: Proceedings of the technology and persons withdisabilities conference, Los Angeles, Calif., 17–23 March 1998

11. James F (1997) Presenting HTML structures in audio. Tech-nical report. Stanford University

E. Pontelli et al.: Intelligent non-visual navigation of complex HTML structures 69

12. Kamel HM, Roth P, Sinha RR (2001) Graphics anduser’s exploration via Simple Sonics (GUESS): providinginterrelational representation of objects in a non-visualenvironment. In: Proceedings of the international confer-ence on auditory display, Espoo, Finland, 29 July–1 August2001

13. Karshmer A, Pontelli E, Gupta G (1999) Software technologyand computer interfaces for the disabled: non-visual naviga-tion of the World Wide Web. In: Proceedings of HCI interna-tional, Munich, Germany, 22–27 August 1999

14. Kennel A, Perrochon L., Darvishi A (1996) WAB: WWW ac-cess for blind and visually impaired computer users. In: BurgerD (ed) New technologies in the education of visually handi-capped. John Libbey Eurotext, Paris

15. Kurze M, Holmes E (1996) 3-D concepts by the sighted, theblind, and from the computer. In: Proceedings of the 5th inter-national conference on computers helping people with specialneeds, Linz, Austria, 17–19 July 1996

16. Oogane T, Asakawa C (1998) An interactive method for ac-cessing tables in HTML. In: Blattner MM, Karshmer AI (eds)Proceedings of the third international ACM conference on as-sistive technologies. ACM Press, New York

17. Pontelli E, Son TC (2002) Navigating HTML tables: plan-ning, reasoning, and agents. In: Proceedings of the interna-tional conference on assistive technologies, Edinburgh, Scot-land, 8–10 July 2002. ACM Press, New York, pp 73–80

18. Pontelli E, Xiong W, Gupta G, Karshmer A (2000) A domain-specific language framework for non-visual browsing of com-plex HTML structures. In: Tremaine M, Cole E, Mynatt E(eds) The fourth international ACM conference on assistivetechnologies. ACM Press, New York

19. Ramloll R, Yu W, Brewster S, Riedel B, Burton M, DimigenG (2000) Constructing sonified haptic line graphs for the blindstudent. In: Tremaine M, Cole E, Mynatt E (eds) The fourthinternational ACM conference on assistive technologies. ACMPress, New York

20. Sowa JF (1984) Conceptual structures. Addison Wesley, Up-per Saddle River, N.J.

21. Vaderheiden G, Chisholm W, Jacobs I (1998) WAI accessi-bility guidelines: page authoring. Technical report, WD-WAI-PAGEAUTH-19980918. W3C

22. Zajicek M, Powell C, Reeves C (1999) Ergonomic factors fora speaking computer interface. In: Hanson M (ed) Contempo-rary ergonomics. Taylor and Francis, London