Generating content and display of library catalogue cards using XML technology

12
SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2006; 36:513–524 Published online 19 January 2006 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/spe.707 Generating content and display of library catalogue cards using XML technology Jovana Vidakovi´ c ,† and Miloˇ s Rackovi´ c Department of Mathematics and Computer Science, Faculty of Science, Trg Dositeja Obradovi´ ca 4, 21000 Novi Sad, Serbia and Montenegro SUMMARY This paper presents the use of XML technology in modelling library documents, i.e. catalogue cards (and the type of reports) found in a library information system. The method of schema formation for content of various types of library catalogue cards is also described. The display of catalogue cards has been done based on described schemas. The display process has been performed in two steps. The first one extracts the content from a bibliographical record based on the schema describing catalogue card concepts. The result is an XML document containing all catalogue card concepts filled in with the corresponding content from the bibliographical record. Its display is being performed in the second step, resulting in an HTML document. As well as adequate representation of data obtained by searching bibliographical material databases, reporting is of prime importance in a library information system. Copyright c 2006 John Wiley & Sons, Ltd. KEY WORDS: XML; XML Schema language; catalogue cards; concepts; display INTRODUCTION There are various types of reports in librarianship. One kind comprises reports about bibliographical material relating to reports on processed publications in the local or remote bibliographical material databases. They include catalogue cards, among other things. Catalogue cards are reports describing objects in a library collection. A catalogue card contains concepts. A concept contains fields, fields contain subfields and subfields contain subsubfields. The International Standard Bibliographic Description (ISBD) standard is used for defining the content and appearance of catalogue cards. The ISBD standard [1] is an international standard for bibliographical description and it states exactly the order of bibliographical data, the application of a particular punctuation system and data retrieval Correspondence to: Jovana Vidakovi´ c, Department of Mathematics and Computer Science, Faculty of Science, Trg Dositeja Obradovi´ ca 4, 21000 Novi Sad, Serbia and Montenegro. E-mail: [email protected] Copyright c 2006 John Wiley & Sons, Ltd. Received 16 July 2004 Revised 28 March 2005 Accepted 6 April 2005

Transcript of Generating content and display of library catalogue cards using XML technology

SOFTWARE—PRACTICE AND EXPERIENCESoftw. Pract. Exper. 2006; 36:513–524Published online 19 January 2006 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/spe.707

Generating content and displayof library catalogue cards usingXML technology

Jovana Vidakovic∗,† and Milos Rackovic

Department of Mathematics and Computer Science, Faculty of Science, Trg Dositeja Obradovica 4,21000 Novi Sad, Serbia and Montenegro

SUMMARY

This paper presents the use of XML technology in modelling library documents, i.e. catalogue cards(and the type of reports) found in a library information system. The method of schema formation forcontent of various types of library catalogue cards is also described. The display of catalogue cards hasbeen done based on described schemas. The display process has been performed in two steps. The first oneextracts the content from a bibliographical record based on the schema describing catalogue card concepts.The result is an XML document containing all catalogue card concepts filled in with the correspondingcontent from the bibliographical record. Its display is being performed in the second step, resulting inan HTML document. As well as adequate representation of data obtained by searching bibliographicalmaterial databases, reporting is of prime importance in a library information system. Copyright c© 2006John Wiley & Sons, Ltd.

KEY WORDS: XML; XML Schema language; catalogue cards; concepts; display

INTRODUCTION

There are various types of reports in librarianship. One kind comprises reports about bibliographicalmaterial relating to reports on processed publications in the local or remote bibliographical materialdatabases. They include catalogue cards, among other things. Catalogue cards are reports describingobjects in a library collection. A catalogue card contains concepts. A concept contains fields,fields contain subfields and subfields contain subsubfields. The International Standard BibliographicDescription (ISBD) standard is used for defining the content and appearance of catalogue cards.The ISBD standard [1] is an international standard for bibliographical description and it states exactlythe order of bibliographical data, the application of a particular punctuation system and data retrieval

∗Correspondence to: Jovana Vidakovic, Department of Mathematics and Computer Science, Faculty of Science, Trg DositejaObradovica 4, 21000 Novi Sad, Serbia and Montenegro.†E-mail: [email protected]

Copyright c© 2006 John Wiley & Sons, Ltd.Received 16 July 2004

Revised 28 March 2005Accepted 6 April 2005

514 J. VIDAKOVIC AND M. RACKOVIC

Figure 1. An example of a catalogue card.

from predetermined sources. According to the ISBD standard, elements of a bibliographical descriptionare grouped in areas whose order, content, language and alphabet are exactly defined. A catalogue cardis made using ISBD principles. An example of a catalogue card is given in Figure 1.

The XML technology is suitable for use in library information systems since library documentsare structured and it is possible to use XML documents to model them. Library data are convertedinto XML documents, as described in several projects. The conversion of MARC records to XMLdocuments has been done in the Java programming language, and it has been described in the MedlineProject of Stanford University Medical Center [2].

Validation of XML documents is done with the help of DTD. The USA Library of Congress hasdeveloped the MARC XML project which features the manipulation of MARC data in the XMLenvironment [3].

Conversion between MARC and MARC XML, published by the Library of Congress, is done usingMARC4J. MARC4J is the library for working with MARC records in Java [4]. Using MARC4J, it iseasy to write any kind of Java application or servlet that involves MARC or MARC XML data.

At the University of Buffalo, State University of New York, MARC records were converted to XMLcatalogue pages containing bibliographical information, holdings information, and more [5]. Using anXSL stylesheet with properly tagged XML files, all of the data in the MARC records were preservedand control of the appearance of the page was gained. Citation elements of interest to the general user,such as author, title, call number and subject, were displayed and labelled.

In Zeremski [6] and Jaksic [7] the schema of the UNIMARC format and the bibliographical recordschema have been described, as well as bibliographical records validation.

Currently, available references do not give an insight into the way of modelling library reports.If reports are accessible, only the final result of the database search of a library information system isvisible, i.e. the display of data about the publication searched for.

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY 515

The development of the library information system BISIS started in 1993 at the University of NoviSad [8]. So far it has featured reports modelled by various means. In [9] the extended model of theentity relationship vocabulary of the catalogue cards has been given, and the complete specification isdone using the ORACLE CASE tool Designer 2000. Vulic [10] shows the object-oriented specificationof the system for reporting using the unified modelling language. In the current version of BISIS,library reports are generated using the Java programming language.

As a part of further development of BISIS, this paper describes the method of modelling librarycatalogue cards using the XML Schema language. First, the concepts of different types of cataloguecards are described by a schema, and then the particular types of catalogue cards are described byseparate schemas. The schema is used to completely describe the concepts and content of the cataloguecards. In view of the described procedure it is possible to extend the document by new concepts, and itis also possible to model other types of library documents.

This paper also presents two programs written in the Java programming language. The first oneextracts the content out of the XML library record following rules of the described schemas, forms anew XML document which is then displayed as HTML by the second program. The detailed descriptionof the system for generating and displaying library catalogue cards is shown in [11].

SPECIFICATION OF LIBRARY CATALOGUE CARD CONCEPTS USING XMLSCHEMA LANGUAGE

During analysis of the catalogue card concepts it has been noted that there are concepts recurring onvarious types of catalogue cards [12]. Catalogue card concepts have been modelled by a single schema,i.e. the content of each concept is described by citing the fields and subfields which it is made up of.This schema contains a description of the concepts, fields and subfields. The punctuation is also given,and particular catalogue card types are modelled by schemas invoking corresponding concepts by citingtheir names.

A part of the schema for catalogue card concepts description is given in Figure 2.The root element is Concepts. It consists of subelements representing concepts. Each concept in

the schema is modelled either by a sequence or choice of elements representing the fields comprisinga particular concept. The exception is the CallNumber concept, where subfields are cited instead offields, subfields and subsubfields. This has been done so as to achieve the consistency of the same levelof concepts. Attributes appearing for concepts, fields and subfields define punctuation, display manner,alphabet, processing rule, justification, and font.

The punctuation relating to the concept is of the String type and it can have the following values:new line, next line. It denotes the place on the catalogue card where the content of the new concept willbe printed in relation to the previous concept.

The punctuation for fields and subfields is an attribute of the String type representing a characteror a number of characters preceding the display of a field or subfield from the record. If a subfield isrepetitive and the punctuation is changed during the next occurrence, the attribute punctuationRep isintroduced. The special case is conditional punctuation modelled by punctuationCond. This attributeis cited in case the punctuation of a field depends on other fields occurring in the record. The valueof this attribute is a conditional expression starting with a keyword if and followed by an expressiondepending on what is taken into account when determining punctuation.

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

516 J. VIDAKOVIC AND M. RACKOVIC

Figure 2. A part of the catalogue cards concepts description schema.

The rules for processing are as follows: aggregation, or, and. The application of a certain ruledepends on which fields, i.e. subfields, should be included in the particular concept. The processingtype is modelled by the attribute processing type. The aggregation rule processes all cited elements,i.e. fields and subfields, as they appear, and stores all those in the record to the concept content. This ruleis represented as a sequence of corresponding elements, i.e. by an xsd:sequence element. The or rule isapplied when the concept content should contain only one of the cited fields. As soon as the first fieldfrom the list is found in the bibliographical record, the processing is terminated. It is also possible torepresent this rule as a choice of elements on the lower hierarchy level, i.e. by an xsd:choice element.The and rule is applied in the case that content of a concept should contain all cited fields. If anyof the fields do not exist in the record, processing of the entire concept is unsuccessful. This rule isrepresented as a sequence of elements.

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY 517

The mode of display is modelled by the justification and font attributes. The attribute justificationis cited only with concepts whose content should be justified to the right and it has the valueright. These are the concepts CallNumber and InventoryNumber. With the remaining concepts it isunderstood that the content is justified to the left, so it is not cited. The font attribute is cited withconcepts where the font is different from the standard font in which catalogue cards are being printedor the font size is different from the standard one. The standard font is Courier New, and the standardsize is 10 pts. The attribute font can also have the following values: bold, if the contents should beemphasized, or compressed, if the letters should be less in the display.

The CallNumber and Headword concepts are represented as the choice of fields, i.e. subfields fromthe bibliographical record which comprises them. The rule for processing these concepts is or, and thatis represented by the xsd:choice tag.

CallNumber is a mark put on library material in order to identify it and locate it in the warehouse.It consists of numbers, letters or combinations of the two. The data on location of the givenbibliographical unit (call number) are located in the fields 996 or 997, in the subfield d which isrepetitive. So, one of the two subfields is chosen: 996d or 997d. CallNumber is printed at the topright corner of the catalogue card, so the value of the justification attribute is right. The punctuation ofthis concept is not specified. The processing type is or, because the elements are processed in the givenorder and the processing terminates as soon as one is successfully processed.

The field 996 subfield d is a complex type consisting of the sequence of elements, i.e. subsubfields.The rule for processing subsubfields is aggregation, because the subsubfields are processed in the givenorder and the subsubfield found in the record is being printed on the catalogue card.

The subsubfields have a defined punctuation. For the subsubfield 1 of the field 996 of the subfieldd (location mark) the punctuation is not specified, since it is the first subsubfield in the concept.The subsubfield n of the field 996 of the subfield d (current number) has the punctuation ‘–’, andthe subsubfield f of the field 996 of the subfield d (format) has the punctuation ‘/’.

Yet another concept is the main description. The main description on a catalogue card comprisesbibliographical cataloguing data on the publication from the block 2. It is the content of all of thefields from block 2 present in the given record. The display of the main description starts with the title(subfield 200a) from the fourth character of the entry, while the next lines of the main description areprinted starting with the first character of the entry. While printing the separate subfields, punctuationrules defined in the ISBD standard, or another set of rules connected to practice, are respected.Depending on the publication type, the appearance of the main description on the catalogue card varies.If the monograph with individual entry is in question, the main description is as shown in the Figure 3(CataloguingDescMon).

The concept is modelled by a complex structure, i.e. a sequence of elements representing fieldsbeing contained by the main description for monographs. The rule for processing is aggregation,because the fields are processed in the given order. The rule for processing subfields within fields is alsoaggregation, since it is possible that the bibliographical record does not contain any of the subfieldswhich are cited in the schema. Each subfield has an attribute punctuation, whose value is printedbefore the content of the subfield from the record. This concept can also contain repetitive subfields,where the punctuation differs when appearing for the first and subsequent times, so the attributepunctuationRep is introduced, whose value is printed before the repetitive occurrence of the subfield.There are also subfields where punctuation depends on the occurrence of the remaining subfields inthe record. These subfields have a defined attribute punctuationCond. The order of the application

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

518 J. VIDAKOVIC AND M. RACKOVIC

Figure 3. The CataloguingDescMon concept.

of the punctuation during processing is as follows: punctuationCond, punctuationRep, punctuation.The punctuation for a repetitive field is applied in case a subfield which has already appeared in therecord and gains advantage over the regular punctuation.

Figure 4 gives an example of a subfield with attributes punctuation and punctuationRep.An example of a subfield with conditional punctuation is shown in Figure 5. If the subfield i of the

field 225 is not preceded by the subfield h, but another, then the subfield i is preceded by the punctuation‘.’ (point break); otherwise it is preceded by ‘,’ (comma break). The conditional expression modelling

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY 519

Figure 4. A subfield with repetitive punctuation.

Figure 5. A subfield with conditional punctuation.

this case is as follows: if(h not before) ‘.’, checking at the subfield i if there is not a subfield h beforeit, setting the punctuation to a new value.

The main description by other kinds of publications is modelled in a similar way. Each of them ismodelled by a separate concept. Concepts differ only in the content of the field appearing in each ofthem. For example, the main description for serial publications has the field 207 instead of 205, etc.

The remaining concepts are modelled in much the same way, as described in detail in [11].

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

520 J. VIDAKOVIC AND M. RACKOVIC

Figure 6. Main catalogue card for monograph with an individual headword.

SPECIFICATION OF XML SCHEMA FOR DESCRIBING PARTICULAR CATALOGUECARDS

By describing appearance and content of concepts in a schema, it is possible to define all card typesusing these concepts. When defining these kinds of documents only the concept names are givenwithout providing any details of the content, the display form and punctuation of the field, subfieldor subsubfield present in the mentioned concept, as is described in [11].

The catalogue card about to be described in this paper is the main catalogue card for the monographwith an individual headword. This catalogue card type consists of the following concepts: call number,headword, main description for monograph with individual headword, remarks, ISBN, UDC numberand inventory number. The schema modelled for this catalogue card type is shown in Figure 6 in theSchema Design view of the XML Spy Editor.

It is possible to model XML Schema for an arbitrary type of catalogue card consisting of conceptsspecified in the schema for catalogue card concepts description in a similar way.

IMPLEMENTATION OF GENERATING CONTENT AND DISPLAY OF LIBRARYCATALOGUE CARDS

In the library information system BISIS bibliographical records are saved in a UNIMARC format.A user searches the database by querying a particular field or fields, for example by author or booktitle, and gets a record, in the UNIMARC format, as a result. In the future development of the BISISit is planned to use XML technology so that records are transformed from the UNIMARC format toXML documents. The obtained XML document should be valid according to the schema describedin [6]. If a document is valid a catalogue card will be created.

Catalogue cards are displayed with the help of two programs written in the Java programminglanguage. The first program extracts content from an XML bibliographical record based on the schemadescribing catalogue card concepts and the schema describing a particular type of catalogue card.

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY 521

Figure 7. Class diagram of the first program.

The program first traverses the structure of the schema describing a particular type of catalogue cardand finds the names of the concepts the card consists of. After that, the program looks for these conceptsin the schema for describing the catalogue card concepts. It looks for the list of fields, subfields andsubsubfields of the concept. Afterwards, the cited fields and subfields, i.e. subsubfields, are searchedfor in the record, and their content is extracted. The punctuation, found in the schema describing thecatalogue card concepts, appends to the record content. The content with the punctuation is placed intoa new XML document whose tags have names of the concepts. Beside the content, all data importantfor display, such as font and justification, are placed into that XML document.

The class diagram of the first program is shown in Figure 7. The class FirstPass is the startingclass in the first program and it uses methods from classes XMLUtils and MyXMLParser. The classXMLUtils is realized in the current BISIS version and is used to work with XML documents. The classMyXMLParser transforms the input XML file into a DOM (Document Object Model) tree.

The class FirstPass uses the methods of the ConceptProcess class which process concepts.The attributes of the concept are being found, and subsequently so are the fields that the conceptconsists of. The names of the fields are stored in a list for further processing. All concepts, exceptfor ArabicTracing and RomanTracing concepts, are further processed using the methods of theFirstLevelProcess class. Based on the list of field names from concepts, the appropriate fields inthe record are being searched for. The subfields of the found field are also put into the special list,because it is important to know if a subfield has already appeared in the record in order to determineits punctuation. If punctuation of a subfield depends on the appearance of another subfield in therecord, the conditional punctuation is applied by processing the corresponding conditional expression.The concepts ArabicTracing and RomanTracing are enclosed in separate classes due to their structurebeing modelled in a different way from other concepts.

The resulting XML document is an instance of the schema modelled for checking if newly generatedXML documents are valid. Part of this document is shown in Figure 8. The group attributes is separated

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

522 J. VIDAKOVIC AND M. RACKOVIC

Figure 8. Part of the schema for checking if output XML documents are valid.

Figure 9. Class diagram of the second program.

because they appear in all concepts, as shown in the schema describing the catalogue card concepts.The document formed in this way is ready for display which is carried out by the second program.

In the second program the tags relating to display are found and replaced with the appropriate HTMLtags and attributes. For example, the content of the concept CallNumber is written right justified.The content of the attribute justification is put in the attribute ALIGN in the HTML tag TD, sinceit provides particular indentation. The corresponding places in the HTML code store the content of thetag content. Each of the attributes is processed by a separate class, and the class diagram of the secondprogram is shown in Figure 9.

The final result of these two programs is an HTML document ready for presentation by the browser.An example of the catalogue card display generated by the described application and shown by thebrowser can be seen in Figure 10.

The detailed specification and the examples of outputs from the programs are shown in [11].The HTML document which is the final result of these programs can be used as a means of showing

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY 523

Figure 10. An example of a catalogue card.

existing catalogue records on the screen in the traditional catalogue card format using a Web browser.This application can be extended for printing catalogue cards, so that catalogue cards for new andexisting catalogue records can be printed in the traditional format.

CONCLUSION

In this paper a way of bibliographical catalogue cards modelling using the XML Schema languageis described, as well as display of these cards. A general description of catalogue cards is made bymodelling the schema describing catalogue card concepts. Particular catalogue cards are modelled byindividual schemas. The implementation is done in the Java programming language based on theseschemas.

This way of modelling bibliographical catalogue cards provides a simple method for adding newconcepts and types of catalogue cards and for modifying existing ones. If an existing type of a cataloguecard should be changed or a new type should be made using existing concepts, it is enough to change

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524

524 J. VIDAKOVIC AND M. RACKOVIC

or make a schema for that type of catalogue card by citing only the concepts which it consists of, whichcan be done by the end user, i.e. a librarian.

If a new concept should be created or an existing one should be modified, it is enough to change itor add it to the general schema by citing all the fields and subfields comprising the concept. It shouldbe done abiding by the rules which the schema has been modelled by, so as to enable the applicationto process the new concept as well.

Only if a new concept cannot fit into any of the existing rules, or needs some specific processing, isit necessary to change the existing programs.

This system for generating and showing bibliographical catalogue cards should be integrated intothe library information system BISIS based on the XML technology. The system can be expanded sothat it can support other types of bibliographical reports. This way of report modelling could be appliedto any other structured report.

ACKNOWLEDGEMENT

This paper is part of the scientific research project ‘XML technologies and cooperative information systems’no. 1875, supported by the Ministry of Science, Technologies and Development, Republic of Serbia.

REFERENCES

1. Family of ISBDs. http://www.ifla.org/VI/3/nd1/isbdlist.htm [19 January 2005].2. Clarke KS. Medlane/XMLMARC update: From MARC to XML database, a project of Lane Medical Library, Stanford

University Medical Center. http://elane.stanford.edu/laneauth/medlane 2001.html [19 January 2005].3. MARC XML—MARC 21 XML Schema, Official Web site. http://www.loc.gov/standards/marcxml/ [19 January 2005].4. Peters B. MARC4J http://marc4j.tigris.org/ [19 January 2005].5. Ludwig M. Breaking through the invisible Web—1/15/2003, netConnect, CA266430.

http://www.schoollibraryjournal.com/article/CA266430.html [19 January 2005].6. Zeremski M. Modelling of UNIMARC format in XML technology. Master Thesis, Novi Sad, 2002 (in Serbian).

Available at: http://diglib.ns.ac.yu/ndltd/docs/set1/ndltd64/ZeremskiMMagistarskaTeza.pdf [19 January 2005].7. Jaksic M. Mapping of bibliographical standards into XML. Software: Practice and Experience 2004; 34(11):1051–1064.8. Surla D, Konjovic Z (eds.). Distributed Library Information System BISIS. Group for Information Technologies: Novi Sad,

2004 (in Serbian).9. Savic N. System for generating catalogue cards in Library Information System. Master Thesis, Belgrade, 1998 (in Serbian).

10. Vulic T. The reporting and documentation process of the Library Information System. Master Thesis, Novi Sad, 1997(in Serbian).

11. Vidakovic J. Modelling and implementation of bibliographical catalogue cards in XML technology. Master Thesis, NoviSad, 2003 (in Serbian). Available at:http://diglib.ns.ac.yu/ndltd/docs/set2/ndltd335/JovanaMagistarski.pdf [19 January 2005].

12. Vidakovic J, Rackovic M. Modelling the concepts of bibliographical catalogue cards using XML Schema language. NoviSad Journal of Mathematics 2003; 33(2):95–102.

Copyright c© 2006 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2006; 36:513–524