Hypermedia Databases: A Specification and Formal Language

11

Transcript of Hypermedia Databases: A Specification and Formal Language

Hypermedia Databases: A Speci�cation

and Formal Language

Yoshinori Hara

1

and Rodrigo A. Botafogo

2

1

C&C Research Labs

2

Media Technology Research Labs

NEC Corporation, 4{1{1, Miyazaki, Miyamae-ku, Kawasaki, Kanagawa 216, Japan

Abstract. Improving authoring and browsing techniques is fundamen-

tal if large hypermedia applications are to be authored and browsed

e�ciently. This paper presents a new, two step approach, for the de-

velopment of hypermedia systems. First data modeling is done using

standard database techniques. Second, a selected part of the database is

\projected" onto the \hypertext world." Using this approach, hypertext

and database technologies are integrated forming a powerful symbiosis:

hypermedia databases.

Advantages of this new approach are: (a) applications can be developed

using structured design, (b) nodes and links can be automatically gener-

ated, (c) it becomes much easier to author and update the application,

(d) query mechanisms are improved, (e) the same data can be reused

for di�erent applications, (f) reduction of redundancies and inconsisten-

cies, data sharing, improved security, etc., are obtained by having the

hypertext build on top of a database management system.

Introdction

Once upon a time, a 200 line program was considered a feat of intellectual

power, but with new compiling techniques, structured and object oriented pro-

gramming, a 200 line program can now be written in an afternoon. 200 nodes and

a couple of hundred links, and hypertext developers start talking about medium

size hypertext. Make it a thousand nodes and a couple of thousand links and

we are talking very large hypertext. It is now great time to start developing

hypertext with thousands of nodes and some hundred thousand links. If such

hypertext sizes are ever to be reached, we need to start thinking about more

automatic authoring (just think how much time it would take to add 100,000

links manually).

However, in the hypertext community, when one talks about automatic au-

thoring, immediately the image of low quality hypertext, not properly tailored

for its purpose is conveyed. This is clearly not our goal. We want high quality

hypertext, well planned and clearly structured. It should be clear that only by

improving authoring techniques can one move towards this goal. Instead of say-

ing: \add a link from node 250 to node 1273," authors should say: \create an

index of all french painters sorted by their date of birth and add links from this

index to the appropriate painter node."

A �rst step for the improvement of authoring, browsing and search is the

development of stronger hypermedia models that escape or extend the node and

link paradigm. Only in the last European Hypertext conference, ECHT'92, three

such models are presented [3, 9, 15]. Although having composite nodes, typed

links, etc., is a requirement for future generations of hypertext systems, one is

still at lost when having to decide which nodes to aggregate, or what type of

links to use.

On the other hand, database technology has been concerned exactly with

the issues stated above. Through schema organization, declarative access, views,

and also aggregation and generalization or more recently with Object Oriented

Database Systems, a strong theory about data structuring and retrieval has

been developed. Database systems, however, lack some features that make the

strength of hypertext: author's structuring, navigational access, history, brows-

ing, etc.

This paper proposes new theoretical concepts, and practical formal language

operations that provide a natural integration of both hypermedia and database

technologies. Some e�ort has already been done in that direction [11, 12], but

basically the database is used to implement the underlying hypertext data model

and not for its strong data modeling capabilities. In this paper we propose a

two step authoring approach: �rst we model our data using a standard database

modeling approach such as the E-R model, the relational model, etc. On a second

step we \project" the database into the hypermedia space.

DesignPhilosophforHpermediaDatabase

Hypermedia technology has now matured to the point that authors are start-

ing to write large applications, such as engeneering manuals [8], electronic li-

braries [4], and large scale CSCW's [13]. However, writing a large application is

still a very complex process. Authors have to manage hundreds of nodes and

thousands of links manually and there is still the famous \disorientation" prob-

lem.

In order to develop large applications in a more e�ortless and less error-

prone fashion one needs to abandon add-hoc development techniques and move

to a structured design approach based on well de�ned design methodologies [14].

This approach was taken in database systems with the development of schemas

and schema languages. In the hypertext �eld, Garzotto et al. proposed this very

same idea; however claiming that, since hypertexts have di�erent characteristics

from databases new models needed to be developed. HDM{Hypertext Design

Model [5]{is the result of their e�orts.

We believe, however, that database models can and should be used in the

development of hypertext applications. In order to address the di�erent char-

acteristics of hypertexts, we propose a two step development approach. First,

information is modeled using standard database techniques. At this step hyper-

text is not considered at all. On a second step, we \project" selected parts of

the database onto the hypertext world (see Fig. 1).

Real World

E-R ModelO-O Model

Conceptual Modeling World

Entities

Relationships

Aggregation

Generalization

Reference

Views

Subset of ConceptualModeling World

Hypertext Projection

TraditionalHypertextApproach

DB Model

Nodes

Links

Clustering

View

(Web)

Fig. 1. Two step development process: First model the real world using standard DB

techniques, then project the DB onto the hypertext world.

Reusing database models provides great advantages: �rst, those models are

well understood and many commercial products exit that support their develop-

ment. Second, there is an abundance of well trained personnel. Third, there is a

large research community trying to improve those models further. Fourth, by do-

ing so, we provide a smooth connection between two technologies, database and

hypertext, and bring forth all the advantages of this integration. In particular

we can use existing database applications to start generating hypertexts imme-

diately. Also, if hypertexts are build on top of a database management system

we inherit extra functionality: reduction of redundancies and inconsistencies,

improved security and integrity maintenance, etc. Other advantages are:

Consistent Node Layout: Nodes are obtained from records by de�ning tem-

plates. The use of templates ensure that every node of a same type will

have a consistent layout. Furthermore, if the database allows hierarchies of

objects, layouts can also be inherited.

Automatic Link Generation: Relationships in a database are implicit, based

on record content. Using link de�nitions, i.e., by making relationships explicit

through some language constructs, links can be automatically generated,

greatly reducing the risk of mistakes such as forgotten or dangling links.

Easy to Author/Update: Since nodes are created through the use of tem-

plates, changing them will a�ect whole families of nodes consistently. Also,

since links are automatically generated based on link de�nitions, which are

easily added or removed, authors can experiment at will.

Many Applications/Same Data: Two main activities need to be performed

when trying to transmit information: collecting the data, and presenting it in

an interesting way for the reader. Those two activities, although interelated,

are quite di�erent. When you buy a book, you are not only buying the facts,

you are also buying the authors view of those facts. Dynamic hypertexts

(those in which links are created on the y) only give you the facts; static

hypertexts (structured beforehand by the author) give the facts and the view,

but there is no way to separate one from the other. This is very unfortunate,

as having the facts stored in electronic form should also permit you to easily

change its presentation.

Reconciling the Literalists and the Virtualists: For the literalists links are

created and represented explicitly and navigation is done by traversing those

links. The virtualists, on the other hand, say that any structure is implicit

in the form or content of the nodes, and links are computed over the nodes.

It is clear that each vision brings advantages and disadvantages.

We reconcile the two views in our two way authoring approach, by having

an author at \compile" time create link de�nitions, e.g., \Add links between

all 17th century painters sorted by date of birth," or \Add link between

politicians and the events in which they were participants." Those author

de�ned links are then browsed in a static way, but readers can issue their

own queries in the same query language obtaining dynamic links. In short,

static links are dynamic links (queries) issued by a knowledgeable author

prior to the application delivery.

FormalSpecificationsforHpermediaDatabase

A hypermedia database is a system that integrates database models and hyper-

text structures and in which it is possible to smoothly translate from one model

to the other. For the bulk of this paper, we will work with the relational model

and a minor extension to the node and link model. Although we concentrate

our analysis to the E-R model, a similar approach could be taken for any other

database model.

3.1 Value Space v.s. Object Space

De�nition1. The value space (V-space) is the database space, while the object

space (O-space) is the hypertext space.

One of the advantages to consider both the V-space and the O-space is that

several useful operations in these spaces can be de�ned: hypertext projection

and hypertext clustering between the V-space and the O-space; hypertext view

and hypertext view update in the O-space; relational view and relational view

update in the relational model (see Fig. 2). These operations integrate e�ectively

existing hypertext models with database technologies.

For lack of space, on this paper we will only discuss \hypertext projection."

Relational view and update are the same as in relational databases. For hypertext

clustering see [1, 2, 10, 6, 7]

HypertextClustering

Relational ViewRelational ViewUpdate

Hypertext ViewHypertext View Update

Hypertext Projection

HypertextClustering

V-space

O-space V’-space

O’-space

Fig. 2. Operations on V-space and O-space

In the �gure as one moves from top to bottom (V-space to O'-space) there is

a loss of information. However, while information is lost structure and relevance

are gained.

3.2 Hypertext Projection

Hypertext projection is an operation to translate relations in the V(V')-space

into a speci�c hypertext structure in the O-space. The basic procedure of hy-

pertext projection consists of the following three steps:

Forming Appropriate Relations: The �rst step is not really part of the pro-

jection, but it consists of forming, through relational operators (cartesian

product, projection, etc.), relations that are appropriate to be projected into

the O-Space. Which relations are appropriate depends, of course, on the ap-

plication being build. For instance, if one is constructing a hypermedia about

french painter of the 19

th

century, records containing painters from the 20

th

century might not be appropriate.

Creating nodes from tuples and relations: To create a node from a tuple

or a relation it is su�cient to specify a visualization for them. For tuples,

a visualization is a description of how and where each attribute should be

shown on the display. For a relation, the visualization describes a global

view of all its tuples. A node is, then, an explicit visualization of a tuple or a

relation. Note that the translation from an object in the V-space to a node

is one-to-one.

Creating links by specifying constraints: This step creates links between

related nodes. It consists of the following three sub-steps:

Specifying a set of source nodes, O

S

This step speci�es a set of nodes

to be used as source for the links.

Specifying a set of destination nodes, O

D

This step speci�es a set of

nodes to be used as destination for the links.

Specifying the constraint between O

S

and O

D

This step is necessary

to produce meaningful hypertext links. Examples of such constraints are

select all, i.e., all nodes in O

S

are connected to all nodes in O

D

, select

one, i.e., a node in O

S

is connected to a speci�c node in O

D

, etc.

TranslatingLangage

In the previous section we presented a method for translating from the V-space to

the O-space. In this section we make things more concrete, by presenting an SQL-

like language for the translation. Two steps are necessary for this translation:

creating nodes from relations and tuples, and adding links between nodes.

We will show how those constructs are applied by giving some examples. All

our examples will be based on a hypothetical art database, with painters from

many countries, their works, etc.

The general syntax for creating nodes is:

CREATE NODE [<Relation>]

[SELF:

[NAME = <string>];

[TEMPLATE = <template-name>];

[ASSOCIATE <attribute-commalist>

<field-commalist>] ];

[CHILD:

[NAME = f<string> | attributeg];

[TEMPLATE = <template-name>];

[ASSOCIATE <attribute-commalist>

<field-commalist>] ];

Arguments inside square brackets ([]) are optional, those inside angle brackets

(<>) are to be substituted by the appropriate arguments, and only one argument

from those in braces (fg) separated by 'j' is to be selected. A \Relation" is a

relation of the database; \string" is any string of character; \template-name"

is the name of a template; \attribute" and \�eld" are respectively attributes in

the relation and �elds de�ned in the template. The \commalist" indicates that

a list of elements separated by commas can be used. In ASSOCIATE the size of

the attribute-commalist and �eld-commalist should be the same.

The above construct creates two types of nodes: a composite node generated

directly from the given \relation," and a set of nodes obtained from the tuples

of the relation. There is an implicit ordering of those nodes, following the same

ordering as the tuples in the relation. Also, nodes inherit all the attributes from

the relation, even if they are not seen through the template. The SELF part

gives information on how to create the composite node, while the CHILD part

indicates how to create nodes from tuples. If SELF.NAME is omitted, this name

will be the same as the \relation." If TEMPLATE is omitted, the node cannot

be seen/browsed, but still exists. Finally, if ASSOCIATE is omitted, there is an

implicit relationship between the \attributes" and the \�elds" based on their

names. An example will make things clear.

Assume that a painter relation has at least attributes: name, birth, death,

photo and biography. The next command will create composite node \Painter"

and child nodes obtained from the tuples in the relation \Painter." For example,

if relation \Painter" had 10 tuples, 11 nodes would be created: 1 composite node

called \Painter," and 10 nodes created from the painter's tuples. Note that each

node will also receive a name coming from attribute \Painter.name."

// Create node from relation Painter.

CREATE NODE Painter

SELF: // Composite node

TEMPLATE = ``index.temp''; // will be an index.

ASSOCIATE = (name, birth), (name, date);

CHILD: // Nodes from tuples.

NAME = name; TEMPLATE = ``painter.temp''

ASSOCIATE = // Rel. -> Temp.

(name, birth, death, photo, biography);

(name, born, died, picture, description)

Assume now that for the application being created the author wants to have

a composite node having only the french painters. In that case two steps are

necessary: �rst, de�ne a view over the database using its access language (in our

example SQL). Then create the nodes:

// Creates a view FPainters for the database. For convenience uses

// the same names as the painter template attributes

CREATE VIEW FPainters (name, born, died, picture, description)

As SELECT Painter.name, Painter.birth, Painter.death,

Painter.photo, Painter.biography;

FROM Painter; WHERE Painter.country = ``France''

// Create nodes from the view

CREATE NODE FPainters

SELF:

TEMPLATE = ``browser.temp''; // Graphical browser.

CHILD:

NAME = name; TEMPLATE = ``painter.temp''

In the above speci�cation a set of nodes is created. Assuming that there are 5

french painters in the database, Fig. 3-(a) shows the painters' nodes create from

template \painter.temp," and Fig. 3-(b) shows the graphical browser created

from template \browser.temp." There is yet no way to browse through this set.

We now specify how to create links:

CREATE LINK [link-name]

SOURCE:

NAME = <node-name>;

IN fSELF | CHILDg;

[ANCHOR <field>];

DEST:

NAME = <node-name>;

IN fSELF | CHILDg;

[ANCHOR <field>];

DIRECTION fFORWARD | BACKWARD | BIDIRECTIONALg;

[WHERE <constraint-list>];

\link-name" speci�es the type of the link. SOURCE and DEST are respectively

the source and destination nodes of the links. If CHILD is speci�ed in the IN

clause, then links will be added to the children of the node; otherwise, the link

is added to the node itself. ANCHOR indicates to what �eld in the template

the link should be anchored. Note that DEST has also an ANCHOR. This is

necessary in case the DIRECTION of the link is either BACKWARD or BIDI-

RECTIONAL. WHERE speci�es constraints on the links. It is possible to use

in WHERE all attributes of nodes, e.g., SOURCE.name.

We now specify the in uence relationship form the \Impressionists" to the

\Post-impressionists." Links added are BIDIRECTIONAL so that both \in u-

enced" and \was in uenced by" traversals can be performed.

CREATE LINK Influenced

SOURCE:

NAME = FPainters; IN CHILD; ANCHOR ``Inf''

DEST:

NAME = FPainters; IN CHILD; ANCHOR = ``Inf By''

(a) French painters’ nodes arecreated.

Pictureof person

Descrip-tion

Name BornDead

Prev Next

Pictureof person

Descrip-tion

Name BornDead

Prev Next

Pictureof person

Descrip-tion

Name BornDead

Prev Next

Pictureof person

Descrip-tion

Name BornDead

Prev Next

Pictureof person

Descrip-tion

Name Born

Died

Next

(c) "Influence" relation added."Select all" constraint used.

Prev InfInf By

(b) No links between nodes yet.

French Painters

Graphical Browser

French Painters

Graphical Browser

NextPrev

NextPrev

French Painters

Graphical Browser

(d) "Next" added. "Select one"constraint used.

NextPrev

Fig. 3. Conversion from the V-space to the O-space.

DIRECTION BIDIRECTIONAL;

WHERE SOURCE.school = ``Impressionism'',

DEST.school = ``Postimpressionism''

Note that links do not need to be one-to-one (see Fig. 3-(c)). In this example it

is most likely that an one-to-many relationship exists. How to decide to which

node to jump when button \Inf" is clicked, is part of the user interface. One

possible solution, though, would be to show the list of all possible destination

nodes.

It is now possible to start browsing through the FPainter node, but it is not

necessarily true that all nodes are accessible. It would be interesting to have all

painters sorted by their date of birth and linked using a \next" button (see dotted

links in Fig. 3-(d))

3

. The sorting is done by de�ning a view on the database

(remember that there is an implicit ordering of the nodes which is identical to the

tuples' ordering), and the linking is similar as above. ANCHOR the link to the

\next" button, the DIRECTION is FORWARD, and the constraint \WHERE

SOURCE.next = DEST," where \next" is an implicitly de�ned attribute of the

node. Other attributes are: �rst, last, and a number, e.g., DEST.5.

Conclsion

In this paper we proposed a novel approach for authoring hypermedia applica-

tions: �rst, we model our data using standard database techniques, and then,

we project the database into the hypermedia space. This novel technique when

provided with four operations: hypertext projection, hypertext clustering, rela-

tional view, and hypertext view, e�ectively and smoothly integrates hypertext

and database technology creating what we call a hypermedia database.

With a formal framework to work with, it became possible to provide and

SQL-like language for the translation between the database world and the hy-

permedia world. This language not only provides this translation but can also be

used at run time to help retrieve information. Consequently, not only is author-

ing improved, as nodes and links can be created automatically, but also browsing

is enhanced. We believe, that this formal speci�cation and its declarative hyper-

media access language provides a useful perspectives for the next generation of

hypermedia systems.

Although for this paper we exempli�ed our approach using the E-R model

and an SQL-like language, the approach is general and could be applied for any

DB-model. What is requires is that the DB-model supports an access language

through which restructuring of the data is possible. In that case instead of an

SQL-like language, a language similar to the DB access language should be build.

References

1. R. A. Botafogo. Cluster analysis for hypertext systems. In 16th ACM SIGIR

International Conference on Research and Development in Information Retrieval,

pages 116{125, Pittsburgh, Pensylvania, June 1993.

2. R. A. Botafogo, E. Rivlin, and B. Shneiderman. Structural analysis of hypertexts:

Identifying hierarchies and useful metrics. ACM Transactions on Information Sys-

tems, 10(2):142{180, April 1992.

3. P. De Bra, G. Houben, and Y. Kornatzky. An extensible data model for hyperdoc-

uments. In Proceedings of the European Conference on Hypertext, pages 222{231,

Milano, Italy, 1992.

3

Do not confuse the \next" button in template \painter.tem" and the \next" button

in template \browser.temp." Specifying CHILD indicates that the links are to be

added to the painters.

4. D. E. Egan, M. E. Lesk, R. D. Ketchum, C. C. Lochbaum, J. R. Remde,

M. Littman, and T. K. Landauer. Hypertext for the electronic library? core sam-

ple results. In Proceedings of the Hypertext 91 Conference, pages 299{312, San

Antonio, Texas, December 1991.

5. F. Garzotto, P. Paolini, and D. Schwabe. HDM { A model based approach to

hypertext application design. ACM Transactions on Information Systems, 11(1):1{

26, January 1993.

6. Y. Hara, A. M. Keller, and G. Wiederhold. Implementing hypertext database re-

lationships through aggregation and exceptions. In Proceedings of the Hypertext

91 Conference, pages 75{90, San Antonio, Texas, December 1991.

7. Y. Hara, A. M. Keller, and G. Wiederhold. Relationship abstractions for an ef-

fective hypertext design: Augmentation and globalization. In DEXA'91, pages

270{274, 1991.

8. K. C. Malcolm and S. E. Poltrock. Industrial strength hypermedia: Requirements

for a large engineering enterprise. In Proceedings of the Hypertext 91 Conference,

pages 13{24, San Antonio, Texas, December 1991.

9. M. Marmann and G. Schlageter. Towards a better support for hypermedia struc-

turing: The hydesign model. In Proceedings of the European Conference on Hyper-

text, pages 232{241, Milano, Italy, 1992.

10. E. Rivlin, R. A. Botafogo, and B. Shneiderman. Navigating in hyperspace: De-

signing a structure-based toolbox. Communications of the ACM., 37(2):87{96,

February 1994.

11. J. L. Schnase, J. J. Leggett, and Szabo R. L. Semantic data modeling of hyperme-

dia associations. ACM Transactions on Information Systems, 11(1):27{50, January

1993.

12. H. A. Sch�utt and N. A. Streitz. Hyperbase: A hypermedia engine based on a rela-

tional database management system. In Proceedings of the European Conference

on Hypertext, pages 95{108, Paris, France, 1990.

13. K. Watabe, S. Sakata, K. Maeno, and H. Fukuoka. Distributed multiparty desktop

conferencing system: MERMAID. In Proceedings of the Conference on Computer-

Supported Cooperative Work, pages 27{38, Los Angeles, CA, October 1990.

14. G. Wiederhold. Database Design. McGraw-Hill, 1983.

15. Y. Zheng and M. Pong. Using statecharts to model hypertext. In Proceedings of

the European Conference on Hypertext, pages 242{250, Milano, Italy, 1992.

This article was processed using the L

A

T

E

X macro package with LLNCS style