Paganelli

17
Journal of Digital Information, Vol 6, No 3 (2005) A Model-driven Method for the Design and Deployment of Web-based Document Management Systems Federica Paganelli and Maria Chiara Pettenati Departmet of Electronics and Telecommunications, University of Florence, v. S. Marta 3, 50139, Firenze, Italy Email: [email protected], [email protected] Abstract Most existing Document Management Systems (DMSs) are designed according to an approach which is technology-driven rather than based on standard methodologies. Related shortcomings are vendor dependence, expensive maintenance and poor interoperability. Information model-driven methodologies could help DMS designers to solve these issues. As a matter of fact, information models can provide a technology-independent abstract representation of information systems' functionalities. Based on standard formalisms, they are useful to designers to describe the managed domain and to developers to understand and develop the modeled entities according to a standard methodological approach. However, while information models are commonly used by software designers for the design of information systems, such as databases and digital libraries, their use in DMS design is still in its infancy. This paper provides a contribution in this research area proposing a method for Web-based DMS design based on an information model, named Document Management and Sharing information Model (DMSM). We have also developed a set of tools, the DMSM Framework, that provides designers with DMS design and deployment facilities. Based on this instrumental support, the proposed method facilitates the design and fast prototyping of DMSs, dealing with requirements of open standard compliance, cost effectiveness and uniform access to heterogeneous data sources. 1. Introduction Document Management (DM) is a critical issue for every kind of organization, where a lot of effort is spent in properly creating, distributing and managing documents. While just some organizational information is stored in relational databases, a relevant percentage is available in unstructured digital formats ( the451, 2002). Documents available in unstructured formats (also called throughout this paper unstructured documents) are commonly-used text and multimedia documents. In a typical company, reports, contracts and agreements are available as word-processor documents, marketing presentations as slideshows, technical seminars as a/v files and streaming media, and product description as images and CAD files. The characteristics of unstructured documents pose several challenges for their effective management. This situation is due to several factors ( Paganelli, 2004; Fisher & Sheth, 2004): Characteristics of unstructured formats. Unstructured formats do not provide an explicit, formal and separate representation of content structure (i.e. "logical structure") and presentation instructions (i.e. "physical structure"). In unstructured formats, logical and physical structures are generally blended and cannot be processed separately, and applications do not have explicit references to specific content elements. Consequently, software applications have limited capabilities regarding content processing and rendering functionalities (e.g. searching and indexing). Currently effective indexing, retrieval and processing would require the system to be able to access document content with a degree of granularity that cannot be provided by unstructured document formats. Wide adoption of proprietary formats. Most commonly-used formats are proprietary and binary. Information about logical and physical structure, when available, cannot be easily interpreted and processed by heterogeneous applications. As a consequence, the reuse of information across heterogeneous communities, using heterogeneous applications for document creation and sharing, can be extremely cumbersome. Heterogeneity of formats. Data are stored in heterogeneous formats which differ structurally and syntactically. This situation leads to an inherent inefficiency in digital data management. For instance, while a human perceives reading an e-mail, a word-processor document or a web page in the same manner, a machine processes different material in regard to structure, syntax and internal representation ( Fisher & Sheth, 2004). Distribution of data sources. Organizational information is distributed among various physical locations on a network (e.g. mail servers, http servers, PCs, etc.). The access to distributed data sources requires different protocols. Moreover, users may need specific interfaces to search for and access heterogeneous and distributed data sources (e.g. mail client, web browser, etc.). The non-uniformity of the access can disorient the user and compromise its quality of work. Any system which is demanded to effectively access and manage unstructured documents should deal with these critical aspects. 1.1 High-level requirements for Document Management Systems design A Document Management System (DMS) is "the ensemble of applications which enable the automatic execution of storage, organization, transmission, retrieval, manipulation, update and eventual disposition of documents to fulfill an organizational purpose" ( Sprague, 1995, p. 32). In order to deal with the above-mentioned issues related to unstructured document management, DMS design should conveniently match the following non-functional requirements: Design method based on information models. The adoption of a method based on information models and standard formalisms should be considered as a basic requirement for the design and development of a high-quality document management solution ( Ginsburg, 2001; Salminen et al., 2000; Murphy, 1998). Benefits of information models for information system design are discussed in Section 2.1. Standard compliance. Compliance with international and widely adopted standards promotes interoperability among Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149 1 of 17 9/22/2014 12:13 AM

Transcript of Paganelli

Journal of Digital Information, Vol 6, No 3 (2005)

A Model-driven Method for the Design and Deployment of Web-basedDocument Management Systems

Federica Paganelli and Maria Chiara PettenatiDepartmet of Electronics and Telecommunications, University of Florence,v. S. Marta 3, 50139, Firenze, ItalyEmail: [email protected], [email protected]

Abstract

Most existing Document Management Systems (DMSs) are designed according to an approach which is technology-drivenrather than based on standard methodologies. Related shortcomings are vendor dependence, expensive maintenance and poorinteroperability. Information model-driven methodologies could help DMS designers to solve these issues. As a matter of fact,information models can provide a technology-independent abstract representation of information systems' functionalities.Based on standard formalisms, they are useful to designers to describe the managed domain and to developers to understandand develop the modeled entities according to a standard methodological approach. However, while information models arecommonly used by software designers for the design of information systems, such as databases and digital libraries, their usein DMS design is still in its infancy. This paper provides a contribution in this research area proposing a method for Web-basedDMS design based on an information model, named Document Management and Sharing information Model (DMSM). We havealso developed a set of tools, the DMSM Framework, that provides designers with DMS design and deployment facilities. Basedon this instrumental support, the proposed method facilitates the design and fast prototyping of DMSs, dealing withrequirements of open standard compliance, cost effectiveness and uniform access to heterogeneous data sources.

1. Introduction

Document Management (DM) is a critical issue for every kind of organization, where a lot of effort is spent in properlycreating, distributing and managing documents. While just some organizational information is stored in relational databases, arelevant percentage is available in unstructured digital formats (the451, 2002). Documents available in unstructured formats(also called throughout this paper unstructured documents) are commonly-used text and multimedia documents. In a typicalcompany, reports, contracts and agreements are available as word-processor documents, marketing presentations asslideshows, technical seminars as a/v files and streaming media, and product description as images and CAD files. Thecharacteristics of unstructured documents pose several challenges for their effective management. This situation is due toseveral factors (Paganelli, 2004; Fisher & Sheth, 2004):

Characteristics of unstructured formats. Unstructured formats do not provide an explicit, formal and separaterepresentation of content structure (i.e. "logical structure") and presentation instructions (i.e. "physical structure"). Inunstructured formats, logical and physical structures are generally blended and cannot be processed separately, andapplications do not have explicit references to specific content elements. Consequently, software applications have limitedcapabilities regarding content processing and rendering functionalities (e.g. searching and indexing). Currently effectiveindexing, retrieval and processing would require the system to be able to access document content with a degree ofgranularity that cannot be provided by unstructured document formats.Wide adoption of proprietary formats. Most commonly-used formats are proprietary and binary. Information about logicaland physical structure, when available, cannot be easily interpreted and processed by heterogeneous applications. As aconsequence, the reuse of information across heterogeneous communities, using heterogeneous applications fordocument creation and sharing, can be extremely cumbersome.Heterogeneity of formats. Data are stored in heterogeneous formats which differ structurally and syntactically. Thissituation leads to an inherent inefficiency in digital data management. For instance, while a human perceives reading ane-mail, a word-processor document or a web page in the same manner, a machine processes different material in regardto structure, syntax and internal representation (Fisher & Sheth, 2004).Distribution of data sources. Organizational information is distributed among various physical locations on a network (e.g.mail servers, http servers, PCs, etc.). The access to distributed data sources requires different protocols. Moreover, usersmay need specific interfaces to search for and access heterogeneous and distributed data sources (e.g. mail client, webbrowser, etc.). The non-uniformity of the access can disorient the user and compromise its quality of work.

Any system which is demanded to effectively access and manage unstructured documents should deal with these criticalaspects.

1.1 High-level requirements for Document Management Systems design

A Document Management System (DMS) is "the ensemble of applications which enable the automatic execution of storage,organization, transmission, retrieval, manipulation, update and eventual disposition of documents to fulfill an organizationalpurpose" (Sprague, 1995, p. 32).In order to deal with the above-mentioned issues related to unstructured document management, DMS design shouldconveniently match the following non-functional requirements:

Design method based on information models. The adoption of a method based on information models and standardformalisms should be considered as a basic requirement for the design and development of a high-quality documentmanagement solution (Ginsburg, 2001; Salminen et al., 2000; Murphy, 1998). Benefits of information models forinformation system design are discussed in Section 2.1.Standard compliance. Compliance with international and widely adopted standards promotes interoperability among

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

1 of 17 9/22/2014 12:13 AM

heterogeneous information systems, and facilitates heterogeneous data sources management.Uniform access to heterogeneous data sources. The interface of a DMS should provide users with a uniform accessparadigm for search and browsing distributed intelligence available as documents in heterogeneous formats and stored inheterogeneous locations.Cost effectiveness. Implementation of a DMS in an organization has a relatively high cost, in terms of user license,required storage space and maintenance costs. On the other hand, efficient solutions for document management canintroduce cost savings (e.g. reducing storage costs and time spent retrieving critical documents).

1.2 Our contribution

Moving from these considerations, this paper proposes a method for the design and deployment of Document ManagementSystems in organizations, which has its foundation on an XML-based information model, the Document Management andSharing Model (DMSM), fully described in some previous works (Paganelli, 2004; Paganelli et al., 2005). The DMSM aims torepresent, in the form of digital metadata, a set of documents' formal document characteristics and properties which arerelevant to document management and render business and organizational information explicit, in a way which promotesinformation reuse, user-driven extensibility and interoperability with heterogeneous systems. The serialization of the DMSM inXML language, named Document Management and Sharing Markup Language (DMSML) (Paganelli, 2004), provides adeclarative language supporting the design, deployment and operation of a DMS.

The proposed method aims at defining general guidelines and a standard methodological approach for DMS design.Requirements and design specifications are organized and defined using DMSM modeling entities. DMS deployment is thenbased on the DMSML, an XML-based declarative language.In order to provide instrumental support to the proposed method and to facilitate the definition of metadata-based technicalspecifications from socio-organizational requirements, we have developed a DMSML Framework, described in this paper in itsprototypal version. The DMSML Framework is an integrated set of tools, which provide intuitive and user-friendly interfaces forthe creation of DMSML specifications and the deployment of a Web-based DMS, customized according to those specifications.Thanks to DMSML Framework features, the proposed method supports the conception and deployment of a documentmanagement solution, matching with the requirements of a design method based on information models (i.e. the DMSM) andopen standard compliance.

The paper is organized as follows: Section 2 discusses the main benefits of information models and evaluates currentapproaches for DMS design for commercial as well as for open source solutions. Section 3 describes the main characteristics ofthe DMSM, grounding the proposed method. Section 4 details the DMSM-driven method for DMS design. Section 5 describesthe architecture of the DMSML Framework and Section 6 shows the facilities provided by the DMSML Framework for DMSdesign, development and deployment. Section 7 discusses the results and provides insights into future work and Section 8concludes the paper.

2. Background

2.1 Benefits of Information models for Document Management System Design

Information models are abstract and technology-independent representations of managed objects, as defined in literature(Pras & Schoenwaelder, 2003). Information models (IMs) are used in the early stages of the software development cycle foranalysis purposes and business requirement elicitation. An information model can be specified in an informal way (e.g. usingnatural language) or by means of standard formalisms. In the latter case the features of an information system can berepresented in a way which enables both human and machine understanding. The advantages of information models based onstandard formalisms in the design of complex information systems are universally recognized:

IMs make requirements explicit and formalize them in a precise way. More specifically, they are useful to designers todescribe the managed domain and its entities, to operators to understand the modeled entities, and to implementers as aguide to the functionality that must be implemented by means of specific technologies (Pras & Schoenwaelder, 2003).Moreover, information models can be used as a formal basis for the development of access and query paradigms ofapplication interfaces made available to end users for information browsing and retrieval.They provide an abstract representation of features of an information system, by representing them in a technology-independent way. Moreover this abstraction from implementation details can promote interoperability and reuse ofsystem design.Based on modeling formalisms, proper tools can be developed enabling the automatic or semi-automatic code generation.

Thanks to these advantages, information models are well recognized and commonly used in the design and development ofinformation systems in several application domains. Some of them are strictly related to Document Management, such asenterprise modeling, database, hypermedia system and digital library design, just to mention some.

Enterprise Modeling methods include: business and business process modeling methods, such as the Fundamental BusinessProcessing Modeling Language (FBPML) (Chen-Burger et al., 2002) and the Web Information Exchange Diagram (WIED)(Tongrungrojana & Lowe, 2004), organizational modeling (van der Aalst et al. , 2003), and capability and enterprise ontologies(Ushold et al., 1998). Database design methodologies are traditionally based on information models. The relational model(Elmasri & Navathe, 2003) was the first formal database model. More recently, models were defined for object-oriented(Elmasri & Navathe, 2003) and semi-structured databases (Graves, 2001). Relevant contributions in the field of hypermediainformation system design are: Dexter model (Halasz & Schwartz, 1994), WEBML (Ceri et al., 2000), and Ariadne (Monteroet al., 2004). The 5S Formal Framework (Gonçalves et al., 2004) represents one the most relevant attempts to provide acomprehensive formalization for Digital Libraries design, providing the formal foundation for the definition of a DigitalLibrary (DL) declarative language and a DL generator tool.

Information models, together with metadata and markup languages, are widely recognized as mechanisms enablinghigh-quality DMS design (Ginsburg, 2001; Salminen et al., 2000; Murphy, 1998). Based on these seminal contributions, otherworks (Päivärinta, 2001; Karjalainen et al., 2000) provide high-level guidelines and principles for DMS requirement elicitation,

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

2 of 17 9/22/2014 12:13 AM

but they also highlight the need of a methodology for translating socio-organizational requirements into metadata-basedtechnical specifications. Despite that, the study of model-based methods for design and development of DocumentManagement Systems is still in its infancy.

Although the above-mentioned information models can provide useful hints and guidelines, ad-hoc conceptual andmethodological frameworks should be developed for the organizational document management field. For instance, digitallibrary concepts cannot be easily adapted to the organizational context. As a matter of fact, the author-publisher-readermodel, which is typical of digital library information models, cannot be conveniently used to model the information lifecycleinside an organization because it cannot properly express business process requirements and roles and responsibilities definedin an organizational environment (Murphy, 1998). Hypermedia information system design methods focuses on navigation,presentation, structure and behavior issues which differ from DMS design requirements. As a matter of fact, "in hypermediaapplications, information is split into a number of self-contained and unstructured nodes that are connected to related nodesby means of links" (Montero et al., 2004). On the contrary, when dealing with unstructured documents, information isprovided by a chunk of content which does not explicitely contain direct links to other information items. Enterprise modelingprovides useful instruments to model the organizational context in terms of actors, organizational roles and processes, butdocuments are usually considered as information resources supporting specific process steps, rather than as "first-class"entities. As a consequence, enterprise models do not aim at supporting traditional DMS features (e.g. document classification,search and retrieval, etc.). Database models deal with a different kind of content (i.e. mostly structured information), but canprovide useful guidelines for model-driven design. As a matter of fact, our approach is based on conceptual and logical model-driven design, which derives from widely-accepted model-driven database design methodologies.

2.2 Current approaches for document management system design

At present, several existing Document Management Systems are available in the market, both as proprietary and open sourcesolutions. According to Moore and Markham (2002), some of the most important solutions in terms of offered features andmarket diffusion in the domain of Document Management are: Documentum, FileNet, IBM Lotus Notes, Interwoven, MicrosoftSharePoint, and Stellent. Among the open source products, OpenCMS, Apache Lenya, MARIAN, and Xinco deserve to bementioned (1).These systems provide a wide range of functionalities supporting the employees in the use of organizational information. Anevaluation of DMS products according to some functional and technical requirements has been provided by Hendley (2005).For the purpose of this paper, we will evaluate some of these products according to their compliance with the followingrequirements for DMS: open information model, standard compliance, model-driven design methodology.

The analysis synthesized in Table 1 refers to two commercial products, FatWire Content Server and Documentum, and twoopen source products: MARIAN and Xinco.The analysis of these products highlights that only one product, MARIAN, is based on an information model, the 5S (Streams,Structures, Spaces, Scenarios, Societies) Formal Model (Gonçalves et al., 2004), and a design methodology is in progress,based on the 5S model. The other products do not provide neither an open and publicly available information model nor amodel-driven methodological approach for DMS design and deployment (the publicly available methodology of FatWire seemsnot to be based on an information model).

Compliance with technical standards is a requirement commonly understood and addressed by means of wide adoption ofindustrial standards, such as XML and related standards (Sall, 2002), LDAP (Lightweight Directory Access Protocol) (Yeong etal., 1993), SOAP (Simple Object Access Protocol) (Mitra, 2003), Internet protocols, such as HTTP (Hypertext Transfer Protocol)and FTP (File Transfer Protocol) and Java-related specifications. On the other hand, compliance with business standards ispartially accomplished. As a matter of fact, while descriptive metadata standards - e.g. Dublin Core (Dublin Core MetadataInitiative, 2003) - are often used in open source solutions, metadata standards for lifecycle and access policy descriptions arescarcely used.

Even if the analysis of commercial products is limited by the lack of documentation about some requirements (especially aboutthe use of an open information model), the overall remark of this analysis is that these products do not completely addressthe above-mentioned high-level requirements for DMSs. Most commercial systems have monolithic and closed architectures,provide platform-specific solutions and adopt proprietary encoding formats and algorithms (Stickler, 2001). Moreover bothcommercial and open solutions rarely adopt standard modeling methodologies (Stickler, 2001; Paganelli et al., 2005). Thisleads to several disadvantages: poor interoperability among heterogeneous systems, limited portability across platforms, andexpensive system deployment, maintenance and extension activities, which are thus often not affordable for small-mediumenterprises. Generally, open source solutions better deal with requirements of open standard compliance, but do notcompletely fulfill the requirements of open information model and model-driven design methodology.

Based on these evaluation results, this paper aims at providing a contribution towards the definition of an information modeland model-driven design methodology for DMSs, described in the following Sections.

Table 1: DMSs Evaluation results (n.a.: information not available)Evaluation aspects

DMSs

openinformationmodel

standard compliancemodel-driven designmethodologytechnical

standardsbusinessstandards

metadatastandards

FatWireContent Server

n.a.

yesLDAP, XML, SOAPand Internetprotocols, Javaspecifications

n.a. no

a methodology isavailable, but it is notbased on an informationmodel

Documentum n.a.yesLDAP, XML, SOAPand Internet

n.a. no n.a

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

3 of 17 9/22/2014 12:13 AM

protocols

MARIAN

yesopen datamodel

yesInternet protocolsand XML

No standards forlifecycle andaccess policy

yesDublin Corecompliant

The study of a standardmethod is in progress,based on the 5S Formalmodel.

Xinco n.a.yesInternet protocols,SOAP and XML

No standards forlifecycle andaccess policy

no no

3. Document Management and Sharing Model

DMSM is an information model for Document Management Systems, representing digital documents' most relevant propertiesin the form of metadata. The aim of DMSM is to provide modeling constructs which facilitate the design of DMS, matching withthe above-mentioned requirements of information model-driven design, standard compliance, uniform access toheterogeneous data sources and cost effectiveness.

Figure 1 shows the most important steps of the process leading to the DMSM specification: the definition of high-levelrequirements for DMS design, the analysis of relevant properties for document management and the analysis of metadataspecification principles. This section describes the features of DMSM which are relevant for the description of the DMS designmethod. DMSM detailed description is out of the scope of this paper. Further details can be found in previous works (Paganelli,2004; Paganelli et al., 2004).

Figure 1. Schema of the process leading to the Document Management and Sharing information Model specification

In order to define DMSM core properties we analysed organizational digital documents as objects which:

need to be identified and searched for;are shared among colleagues for the same or related purposes;are characterized by different states (e.g. draft, submitted for review, final, etc.) during their lifecycle.

In order to represent these aspects, DMSM consists of three sub-models: a Descriptive Information Model, a CollaborationModel and a Process Model, which respectively allow the representation of descriptive, collaboration- and process-relatedcharacteristics of unstructured documents:

The Descriptive Information Model represents the set of properties which describe and identify the document (e.g. Title,Creator, Date, Description, Document Type, Subject, Contact, Affiliation). These properties are generally used for searchand indexing purposes.The Collaboration Model formalizes how the human resources are structured (organizational schema) and how access toinformation resources is regulated on the basis of organizational roles or responsibilities of individuals (access policy).This model allows the description of access policies to information resources in a customizable and standard way, both ona role- or individual basis. Organizational models then map roles and organizational functions and units to individuals orgroups. The DMSML organization model specifies the organizational units, individuals and related organizational roles. Inorder to satisfy changing requirements (e.g. the setup of a short-term project) it may also be extended with groups orexternal entities which are not institutional members of the organizational model, but may be defined ad-hoc for specificpurposes and have a short life.The Process Model includes the modeling primitives describing document lifecycles and has its theoretical foundation onthe Petri Net process model (van der Aalst, 1998). A document lifecycle usually consists of the following stages: creation,review, publication, access, archive and deletion. A specific lifecycle may not implement all these stages, or mayimplement others, depending on document types. The document lifecycle is a process specified in terms of a sequence oftasks, performed by some actors.

The DMSM model uses some existing metadata standards, in order to promote interoperability, to create a framework ofDocument Management metadata, and to take advantage of existing standard contributions. DMSM uses a part of the DublinCore metadata set (Dublin Core Metadata Initiative, 2003) in the Descriptive Information Model, the eXtensible Access ControlMarkup Language (XACML) (OASIS, 2003) in the Collaboration Model and the Petri Net Markup Language (PNML) (Weber &Kindler, 2002) in the Process Model.

3.1 DMSM metadata specification

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

4 of 17 9/22/2014 12:13 AM

The DMSML metadata specification includes two-abstraction modeling levels:

conceptual modeling, based on the UML graphical notation (Booch et al., 1998). It provides an abstract and technology-independent representation of concepts and relations among concepts. Conceptual models enable people with lowtechnical expertise to understand meaning of data and promote common understanding among technical andnon-technical staff (i.e. end users).logical modeling. This level translates domain-related concepts and relationships in data constructs which are expressedin a rigorous and standard logical data modeling paradigm. Our modeling approach is based on the XML Schemamodeling paradigm (Sall, 2002). The XML serialization of the DMSM is called Document Management and Sharing MarkupLanguage (DMSML).

In Figure 2 we provide an extract of the DMSM, showing a part of the conceptual representation of the DMSM InformationDescriptive Model (Figure 2a) and its logical representation in XML Schema Language (Figure 2b). Figure 2c shows an instanceof the DMSM for a project proposal document. The DMSM instance is an XML document which contains DMSM metadata labelsand values, describing a specific document, and is valid against the syntactical rules encoded in DMSML. An example ofsyntactical rule is that an element "document" should contain an "identifier", a "title", at least one "creator", etc..

Figure 2. Example of the DMSM Information Descriptive Model: a. conceptual model; b. Logical model (XML Schema); c.instance document (XML)

The 2-layered modeling approach facilitates the following steps of DMS design:

discussion and common understanding among software designers and end users, in order to define socio-organizationalrequirements, thanks to the conceptual abstraction;translation of socio-organizational requirements into metadata-based technical system specifications, by means ofDMSML machine-understandable syntax. As a matter of fact DMSML is a declarative language for DMS design whichencodes document descriptive properties, access policies and lifecycles in XML syntax.

Consequently, DMSML can support the design and configuration of a DMS according to the specific requirements of anorganization, providing specific methods and mechanisms to exploit the business knowledge owned by end users, andleveraging on the compliance with standard formalisms and existing metadata specifications. For the sake of clarity, Figure 3provides a graphical representation of DMSML main components: Information Descriptive Model, Collaboration Model, andProcess Model. The complete specification can be found in a previous work (Paganelli, 2004).

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

5 of 17 9/22/2014 12:13 AM

Figure 3. Graphical representation of DMSML main components: Information Descriptive Model, Collaboration Model, andProcess Model

4. Method for DMS design and development

This section describes the method for DMS design and development based on the DMSM information model. This DMSM-drivenmethod covers the whole cycle of activities of DMS development. The iterative process includes the following stages, as shownin Figure 4: Preliminary Meeting, Critical Factors Analysis, Specification of a DMSM-based Solution, DMS Design, Developmentand Deployment, and Testing and Evaluation. Some steps include semi-structured interviews, based on referencequestionnaires. In order to propose a generally-applicable approach, in this paper we describe the main objectives of theinterviews and the suggested profile of the interviewees. As a matter of fact, questions should be tailored to the specificcharacteristics and critical factors of the target organization and questions and their order might consequently need to bemodified on the fly. An example of a reference questionnaire is shown in Table 2, other examples can be found in a previouswork (Paganelli, 2004).

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

6 of 17 9/22/2014 12:13 AM

Figure 4. DMSM Method

4.1 Preliminary Meeting

The first step envisages a meeting with some organization representatives. The aim is to delineate the profile of theorganization and the organization's strategy for information management, in order to highlight existing inefficiencies, problemsand critical factors. Two kinds of questionnaires are used for this activity.

The first questionnaire (Questionnaire A - Organization Profile) is focused on basic information about the organization's profile,such as generic information describing the organization's business goals, services and/or products offered to the market,typology of customers, partners and competitors, size (e.g. number of employees) and geographical distribution of company'ssites. This questionnaire has to be submitted to at least one person which has a deep knowledge of the company (e.g. anexecutive or top manager).

The second questionnaire (Questionnaire B - Practices and Applications for Unstructured Document Management in theOrganization) aims to delineate the organizational strategy for information management, focusing especially on unstructureddocuments. The aim is to collect information about information systems in use and existing policies for documentmanagement, to understand how these policies are formalized and shared in the target organization (e.g. formalized aswritten procedures, tacitly shared and based on practice, etc.) and to highlight the critical factors and unresolved issues (e.g.obstacles of a DMS purchase in an organization which does not have yet a DMS). In this case, the interviewees should knowwhich information systems are in use and how end users use them to share and manage documents for organizationalpurposes (e.g. a representative of the IT staff, and people which supply input and/or use output of the system).

4.2 Critical Factors Analysis

The critical factors discovered during the first stage should then be analyzed in order to find the causes of possibleinefficiencies in DM strategies and/or the factors that should be improved (e.g. bad practices, deficiencies of IT tools, lack offormalized procedures). Based on these considerations, the following step aims to plan a solving intervention. In the contextof this work, the intervention is conceived as the definition of an effective solution for unstructured document management.The DMSML model can help in the formalization of a DM strategy which effectively supports the organization's processes.

4.3 DMSM-based Solution Specification

Based on the DMSM model, this stage aims to design a solution for unstructured document management, dealing with therequirements of the target organization. The first step consists in the classification of documents in use in the organization, incollaboration with some organization employees. According to the DMSM model, for each document class (e.g. technicalreport, project documentation and technical offers), the questions should collect information about descriptive information andcollaboration and process- related properties, relevant for document management.

An example of a generally-applicable questionnaire form is provided in Table 2. The collected information should then be usedin order to define the DMS specifications, organized in a Descriptive Information Model, Collaboration Model and Process Modeland encoded in the DMSML syntax. Based on the collected information, the need to extend/modify the DMSML labels should

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

7 of 17 9/22/2014 12:13 AM

then be evaluated. For instance, we can imagine that a technical offer or the technical specifications for a project should belabeled with the name of the project they refer to. In that case, the model should be extended by adding a "project" label, tofurther characterize and easily retrieve documents which are related on a project affiliation basis. The use of XML Schema asthe encoding language facilitates the extension of the information model and the use of external metadata schemas, by meansof standard mechanisms, such as xs:any, xs:import, and xs:include (Sall, 2002).

Table 2. Questionnaire C - Document Properties

QUESTIONNAIRE C - Document Class Properties

a. Description

a.1 Please briefly describe the document (name, purpose, relatedproject/organizational process, etc.)

a.2 How can this document be classified (meeting minutes, mail, report, etc.?)

a.3 How is it identified (sequential number, code, date)?

b. Collaboration

b.1 What is the access policy for this document?

b.2 How is the access policy specified and interpreted by the system?

c. Process

c.1 Is there a predefined procedure for the management of this document(e.g. guidelines, protocols, etc.)?

c.2 Is a template available?

c.3 Describe the steps of its lifecycle

d. Management

d.1 How do you usually search for this document? (e.g. by Title, author,keywords, project name, etc.

d.2 Does the document refer to other document typologies?

d.3 If it does, How? (e.g. annotations, bibliographic references, URLs, etc.)

d.4 How is versioning managed?

e. IT support

e.1 Which features are provided by the DMS for the management of thisdocument?

NotificationAccess controlVersioningOthers

e.2 Which should be provided?

f. Personal Experiences

f.1 According to your experience, what are the current problems in themanagement of this document type?

f.2 Would you suggest a new procedure, new features or a new solution forDM?

4.4 DMS Design, Development and Deployment

This step is focused on the design, development and deployment of the DMS. The DMSML specifications provide the formalfoundation for DMS design and development. Thanks to the XML syntax, the DMSML-based specifications can be interpreted bya CASE tool for the automatic generation of DMS code. These specifications (e.g. access policies) can also be automaticallyenforced by the DMS during its operation.

In order to facilitate the DMSML-based design and the automatization of development and deployment stages we developed aset of tools and applications, named DMSML Framework. Further information about the DMSML Framework is provided inSections 5 and 6.

It is worth observing that this method aims to be general and technology-independent, and it could benefit from differentCASE and fast prototyping tools, other than those provided by the DMSML Framework.

4.5 Testing and evaluation

A selected group of organization employees (a group of users) should then test the DMS, during their working activities. Thisstep aims to evaluate the capability of a DMSML-based solution of Document Management to address the critical factorsdiscovered and analysed in the first two steps of the method, as well as the level of usability of the DMSML Framework

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

8 of 17 9/22/2014 12:13 AM

Prototype. This investigation in the organization is supported by two kinds of questionnaires:

Questionnaire D - DMSML Impact in the Organization, submitted to the group of users, in order to verify if the newsolution for DMS has corrected the critical factors previously identified. The results have to be analyzed and interpreted inorder to correct/refine the Document Management strategy for the target organization (as shown in Figure 4).Questionnaire E - Usability of the DMSML Framework Prototype, submitted to the group of users and the IT responsible,to evaluate the level of usability of the DMSML Framework functionalities, thus providing a feedback for interfacere-design in the step of DMS Design, Development and Deployment and for a possible refinement of the DMSM-basedSolution Specification (as shown in Figure 4).

5. DMSML Framework

The DMSML Framework is an integrated set of software tools which provide the user with automated support for DMS design,deployment and maintenance, according to the specifications encoded in the DMSML declarative language.

The DMSML Framework consists of three parts, as shown in Figure 5:

a DMS Configurator, which offers a user-friendly graphical interface facilitating the specification of a DMS solution. Itenables a user (i.e. a super-user such as a DMS designer or a system administrator) to generate a DMSML instancedocument, containing the specifications tailored to a target organization, by means of graphical formalisms hiding thecomplexity of the XML syntax. To this extent, the DMS Configurator acts through a wizard which progressively guides theuser through the definition of the workspace, the organizational schema, the folder structure and, finally, a set of lifecycletemplates and access policies, to be assigned to documents or document types. The DMSML instance so created - calledDMSML specifications- containing business and organizational information, such as organizational schemas and accesspolicies, will then be processed by the DMS Generator.a DMS Generator, which is a web-based application enabling the user (i.e. a super-user) to deploy a DMS by uploadingthe DMSML instance through a standard Web browser. It customizes a DMS template (i.e. a set of Document Managementlibraries), according to the DMSML-based specifications, and it deploys a DMS compliant with those requirements.a DMS Web Application, which provides basic Document Management features, accessible through a standard webbrowser. Its configuration is described by the DMSML specifications, in terms of documents' descriptive properties, accesspolicies and lifecycle management. The DMS Web Application configuration and deployment is supported by the DMSConfigurator and the DMS Generator facilities. The functions provided by the DMS Web Application include: facilities fornavigation, document upload, version control, document lifecycle management, access control, search functions (bothmetadata and full-text based), and log file recording. It is worth observing that the document search is based not only ondescriptive metadata (such as title, author, etc.), but also on administrative metadata, related to lifecycle steps or accesscontrol rules (for instance, search for all documents in the state "draft", or all the documents which can be accessed by aproject manager). The extensibility of the DMSML model allows a DMS designer to define ad-hoc search policies for targetorganizations.

Figure 5. DMSML Framework Prototype: Functional Architecture

5.1 DMS Configurator

The DMS Configurator is a Java application. Its architecture consists of an Interface, which uses the JavaSwing Graphic Toolkitand other Graphic Utilities (e.g. images, etc.) and the DMS Configurator Core, built on top of the Java Virtual Machine (Figure6.a). The DMS Configurator Core is composed of five main components:

an XML Schema Parser, which aims to verify the validity of the XML document to the DMSML specifications. The XMLSchema Parser also contains an XML parserJDOM API, used to create, access and manipulate XML Documentsa XPath Engine, validating XPath expressiona Rule Manager, which interprets and enforces the rules associated to the user actionsa set of Basic Services, such as logging and data storage facilities.

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

9 of 17 9/22/2014 12:13 AM

Figure 6. DMSML Framework Prototype three-tiered Architecture: a. DMS Configurator, b. DMS Generator, c. DMS WebApplication Architecture

5.2 DMS Generator

The DMS Generator, as well as the DMS Web Application, are web applications designed according to J2EE (Java 2 EnterpriseEdition) specifications. Both the DMS Generator and the DMS Web Application are characterized by a multi-tier architecture,consisting of a Client, an Application Logic (composed of an Interaction and a Business Logic side), and a Data tier (Figure 6b).

The Client is a standard web browser. The Interaction side is realized by means of JSPs. The Business Logic contains atemplate of a DMS Web Application (i.e. a set of DM libraries) and a set of APIs, called DMSG (DMS Generator) APIs. TheDMSG APIs are a set of Java classes which customize the template according to specific configuration parameters, encoded inthe DMSML language. Based on the features of the DMSML model, the DMS Generator allows a completely declarativeapproach for the design and deployment of a Document Management System for a target organization.

5.3 DMS Web Application

Analogously to the DMS Generator, the DMS Web Application has a multi-tier architecture, based on J2EE specifications, asshown in Figure 6c.

The client side is a standard Web browser. The Interaction part is realized by means of JSPs and it provides the user with coreDocument Management features. The Business Logic is composed of a set of DMS APIs, implemented by Java classes, whichprovide basic functions for the management of workspaces, folders and documents. The DMS APIs consists of severalcomponents:

a Document Manager, providing facilities for navigation, document upload, version control, etc.. This part is mainly basedon the Descriptive Information Model specifications (e.g. folders' organization, title, creator, etc.)a Lifecycle Manager, which enforces the evolution of the document across the lifecycle steps, as specified according to theProcess Model.an Access Manager, which should guarantee that users execute authorized actions, according to the organizational accesspolicies (in the Collaboration Model). As the Collaboration Model is based on the XACML standard foraccess policyspecification, the Access Manager is based on the Sun's XACML Implementation, which is an access control policyevaluation engine, written entirely in Java.a Search Engine, enabling a metadata-based and a full-text document searchan History component, which records log filesBasic Services, such as monitoring and connection to database services.

6. Designing and deploying a DMS using the DMSML Framework prototype

The DMSML Framework Prototype offers support to the DMS designer during the steps of DMSML-based Solution Specificationand DMS Design, Development and Deployment.

6.1 DMSML-based Specification

The DMS Configurator provides the DMS designer with a sequential set of graphical windows, which progressively guide theuser in the DMS configuration, throughout the definition of the workspace, the organizational schema and the folder structure.The DMS Configurator permits to specify the workspace entity, characterizing the information items in terms of DescriptiveInformation Model, Collaboration Model and Process Model.

First, the interface enables the user to specify the workspace organization in folders and sub-folders. For instance, in case ofproject documentation management, the designer can distinguish the following folders, each related to a project executionphase: Analysis, Specification, Development, Accounting. The graphical window, depicted in (Figure 7.a), helps the user inspecifying the organization folder, according to the DMSML Information Descriptive Model. Figure 7.b is an excerpt of a DMSML

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

10 of 17 9/22/2014 12:13 AM

instance document representing the folders' organization (e.g. folder "ProjectA" and subfolders "Analysis", "Specification","Development", "Accounting"), automatically encoded by the DMS Configurator in the DMSML syntax. The user can specifysome properties for each folder: for instance "title", "creator", "affiliation", and "document types" that can be assigned to thatfolder. The system provides some default document types (e.g. technical report, brochure, etc.), but it also enables the user toinsert ad-hoc labels. Analogously to the previous example, Figure 8 shows the graphical window for folder properties'specification (Figure 8.a) and the resulting DMSML document instance (Figure 8.b)

The system provides graphical support for the definition of lifecycle models. Figure 9.a shows the graphical representation ofthe lifecycle template for documents which should be evaluated by a group of reviewers and consequently accepted orrejected. The document lifecycle is a process specified in terms of a sequence of tasks. The execution of a task is usuallytriggered by a transition condition, which can be automatic, time-dependent (e.g. a deadline) or caused by a user action or byan external event, and it is associated to an evolution of the document state (e.g. from "draft" to "in_review", to "accepted",or "refused"). In Figure 9.a circles represent the states of documents (or "places" in the Petri Net language) and rectanglesrepresent the transitions from one state to another. The lifecycle of the document is build upon the concatenation of thesestates and transitions. Figure 9.b shows an excerpt of the DMSML representation of this lifecycle template.

These lifecycle models serve as a collection of templates which can then be assigned to documents in order to accordinglyenforce their evolution during their "life". At design time, the user can assign a lifecycle template to the document typespreviously defined. In order to accommodate a certain level of flexibility, this pre-assignment can be modified by documentcreators by means of a proper interface offered by the DMS.

Finally, the designer can specify the access control policies which regulate the access to the information items on the basis ofroles and responsibilities defined in the organization, as illustrated in Figure 10.a. The DMS Configurator automaticallygenerate the DMSML instance document (Figure 10.b) and check the validity of the specification according to the DMSMLrules.

<workspace xmlns="http://det.unifi.it/dmsml">

<folder> <itemDescription> <dc:title>ProjectA</dc:title>

... </itemDescription>

<folder>

<itemDescription> <dc:title>Analysis</dc:title>

</itemDescription> </folder> <folder> <itemDescription>

<dc:title>Specification</dc:title>

</itemDescription> </folder> .....(other folders) </folder></workspace>

7.a DMS Configurator interface for folders'organization specification

7.b DMSML instance document excerpt for folders' organization specification(DMSML Descriptive Information Model)

Figure 7. DMS specification: organization in folders and subfolders

<folder>

<itemDescription>

<dc:title>Specification</dc:title>

<dc:creator>F. Paganelli</dc:creator>*

<dc:description>documentation about ProjectA specification step</dc:description>

<dc:date>2005-09-24</dc:date>*

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

11 of 17 9/22/2014 12:13 AM

<affiliation>http://det.unifi.it</affiliation>*

<contactInfo>

<contactName>F. Paganelli</contactName>*

<address>via S. Marta 3, Firenze</address>*

<telephoneNumber>+39 055 4796382 </telephoneNumber>*

<faxNumber>+39 055 488883</faxNumber>*

<e-mail>[email protected]</e-mail>*

<url>http://radar.det.unifi.it/people /Paganelli/index.html </url>*

</contactInfo>

<documentTypes> <documentType>technical reports</documentType> <documentType>specifications</documentType> <documentType>brochures</documentType>

<documentTypes>

</itemDescription></folder> * these metadata are automatically filled by the system

8.a DMS Configurator interface for folders'properties specification (full-size version)

8.b DMSML instance document excerpt for folders' properties specification(DMSML Descriptive Information Model)

Figure 8. DMS specification: folders' characteristics definition

<lifecycle><name>lifecycleTemplate</name>

<description>lifecycle of document subjected to review</description>

<tasks>

<task><name>submission</name>

<description>document submission</description>

<transition type = "userAction">

<name>submit</name>

</transition>

<inputState>

<inputStateName>draft</inputStateName>

</inputState>

<outputState>

<outputStateName>in_review</outputStateName>

</outputState>

<automaticActions>

<notification>

<description>reviewers are notified that a new document has been submitted</description>

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

12 of 17 9/22/2014 12:13 AM

<receivers>

<receiverID>reviewer1</receiver>

<receiverID>reviewer2</receiver>

</receivers>

<message>"A new documen has been submitted for review"</message>

</notification>

</automaticActions>

</task>

(other tasks)

</tasks>

</lifecycle>

9.a DMS Configurator interface for lifecycletemplates specification (full-size version)

9.b DMSML instance document excerpt for lifecycle templates specification(DMSML Process Model)

Figure 9. DMS specification: lifecycle templates

<xacml:Policy PolicyId="document_revisionPolicy">

<xacml:Description>Access Policy for the action: "accept document" </Description>

<xacml:Target>

<xacml:Subjects> <xacml:AnySubject/>

</xacml:Subjects>

<xacml:Resources> <xacml:AnyResource/>

</xacml:Resources>

<xacml:Actions>

<xacml:Action>

<xacml:ActionMatch MatchId="urn:oasis:names:tc: xacml:1.0:function:string-equal">

<xacml:AttributeValue DataType= "http://www.w3.org/2001/XMLSchema#string">

accept</xacml:AttributeValue>

<xacml:ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:1.0: action:action-id" DataType= "http://www.w3.org/2001/XMLSchema#string"/>

</xacml:ActionMatch>

</xacml:Action>

</xacml:Actions>

</xacml:Target>

<xacml:Rule RuleId="document_revision_Rule" Effect="Permit">

(other rules)

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

13 of 17 9/22/2014 12:13 AM

</xacml:Policy>

10.a DMS Configurator interface for accesspolicies' specification(full-size version)

10.b DMSML instance document excerpt for access policies specification(DMSMLCollaboration Model)

Figure 10. DMS specification: access policies

6.2 Design, Development and Deployment of the Document Management System

The DMSML specification is processed by the DMS Generator in order to properly customize the DMS template according to theorganization's specific requirements (Figure 11). The DMS Generator web interface enables the user to upload the DMSMLspecification, called Business Configuration Document, together with the technical parameters (e.g. connection to databases,ip addresses, etc.) encoded in a XML document, named Technical Configuration Document. Figure 12 shows an excerpt of aTechnical Configuration Document specifiying the parameters for a connection to a SQL database.

The DMS Web Application offers an intuitive interface with basic Document Management functionalities. The browsing andmetadata-based search interfaces are shown in Figure 13 and Figure 14, respectively.

Figure 11. DMS Generator graphical interface(full-size version)

<system xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="technology.xsd">

<sql> <sqldriver>com.microsoft.jdbc.sqlserver.SQLServerDriver</sqldriver>

<connection>jdbc:microsoft:sqlserver://localhost:3106</connection>

<user>admin</user>

<password>dmsml</password>

</sql>

<display-name>DMSWebApp</display-name>

<context-root>dmswebapp</context-root>

(other parameters)

</system>

Figure 12. Technical Configuration document excerpt

Figure 13. DMS Web Application: browsing interface(full-size version)

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

14 of 17 9/22/2014 12:13 AM

Figure 14. DMS Web Application: search interface(full-size version)

7. Discussion

The DMS Web Application aims to cover the above-mentioned requirements for DMSs: Design methodology based oninformation models, Standard compliance, Uniform access to heterogeneous formats, and Cost effectiveness.

To this extent, we have proposed a DMS design method, which makes extensive use of the Document Management andSharing information Model, throughout the steps of preliminary analysis, critical factor analysis, design, development anddeployment, and testing and evaluation of a DMS in an organization.

The DMSM is a metadata specification which encompasses descriptive, as well as collaborative and process-dependentproperties of organizational documents. The DMSM provides a formal, lower-level (structural) description of an informationmodel for DMSs and supports the conception of a completely declarative approach for DMS design and automatic deployment.

The XML serialization of the model (DMSML) is a declarative language which allows the mapping of organizationalrequirements into machine-understandable technical DMS specifications. As a matter of fact, a DMSML instance contains XMLtags enabling the description of the workspace configuration and folder organization, the creation or reuse of a documentresource classification schema, the specification of the lifecycle and the access policies assigned to documents eitherseparately or on a document type basis.

This work has helped to resolve the need of standard methodological approaches for DMS design by proposing a generally-applicable and technologically-independent method based on the DMSM information model. While generally the specificationsin most available products are embedded in proprietary workflow engines or collaborative applications, DMSML is a declarativelanguage, based on an open and standard-compliant data model.

Moreover, the DMSML Framework Prototype provides automatization support to the design method, reducing the need oftechnical expertise for DMS configuration (the DMS designer is not concerned with the DMSML syntax) and deployment(he/she should upload two XML documents and the system automatically deploys a customized DMS).

Secondly, standard compliance has been achieved in two ways: the DMSML language integrates three existing metadatastandards (Dublin Core, XACML and PNML), and the DMSML Framework is based on standard Web development specifications(i.e. J2EE), and standard languages and technologies, such as XML and XSLT (Sall, 2002).

The other requirements (e.g. Uniform access to heterogeneous data sources and Cost effectiveness) have been partiallyaddressed.As a matter of fact, the use of web standards and protocols allows access to information stored in heterogeneous locations,but does not effectively support information retrieval, indexing and processing across heterogeneous repositories. The clientside is implemented by standard Web browsers, thus providing users with a well-known and uniform paradigm of access,search and retrieval to documents available in heterogeneous formats and stored in heterogeneous locations.Cost effectiveness is promoted by several factors: the DMS Web Application, as well as the whole DMSML Framework, arebased on open source technologies. Furthermore, the instrumental support provided by the DMS Configurator and the DMSGenerator enables to speed up the process of design, development and deployment of the DMS solution and hide sometechnical complexities (such as XML syntax). Because of these cost savings, the DMS Web Application is a candidate for aDocument Management solution which is also suitable for addressing SMEs requirements, but this hypothesis needs to becarefully validated in target organizations.

These issues are going to be addressed in on-going and future activities. Firstly, we are experimenting the proposedmethodology and the use of the DMSML framework for the management of scientific documentation (papers, theses, projectdocumentation, etc.) in our Department. We have also planned an evaluation activity in a small enterprise. The selected SMEis an Italian consulting firm which provides IT services and products to a wide range of customer enterprises. Consulting firmsare highly data-intensive companies, since they depend heavily on the expertise of their people and the documentedinformation produced during their business activities.

Better management of heterogeneous and distributed content repositories could be achieved by adopting metadata harvestingprotocols, which gather metadata about content for resource discovery across heterogeneous repositories. One of the mostimportant harvesting protocols is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) (Lagoze & Van deSompel, 2002), which is widely adopted for digital libraries and cultural heritage information systems.

A higher degree of effective and uniform access to heterogeneous and distributed data sources would require a system tobetter deal with interoperability requirements. The long-term objective would be that of "enabling a machine to read

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

15 of 17 9/22/2014 12:13 AM

documents of varying degrees of structures from heterogeneous data sources and understand the meaning of each documentin order to find associations among those documents" (Fisher & Sheth, 2004). Achieving that kind of interoperability isobviously not a trivial task. Ontology-driven metadata extraction and annotation mechanisms can be used to supportadvanced classification techniques and to provide metadata with contextual relevance within a given domain. Thesetechniques could provide a normalized "semantic" view of heterogeneous data, providing a certain degree of machineunderstanding and processing, across syntactical and structural differences of information sources.

For what concerns cost and feasibility evaluation, the DMSML Framework has been developed in an academic framework as aprototypal version. Consequently, usability tests and user-centered re-design of the existing prototype interface should beperformed, together with a market analysis and a business plan, in order to promote the research transfer into industrialapplication. At present, we are evaluating the possibility of creating an open source project, based on the DMSML Framework,in order to benefit of cooperative software development advantages.

8. Conclusions

This paper described a DMS design method based on the DMSM information model. DMSM is a metadata specification whichencompasses descriptive, collaborative and process-related properties of organizational documents. The method encompassesthe stages of Preliminary Meeting, Critical Factors Analysis, Specification of a DMSM-based Solution, DMS Design,Development and Deployment, and Testing and Evaluation. The DMSML (i.e. the XML serialization of DMSM) enables adeclarative design approach and the DMSM Framework Prototype (i.e. a set of tools for DMSML editing and DMS generation)facilitates automatic development and deployment of a DMS for a target organization.

This model-driven method satisfies two basic requirements for DMS design: design methodology based on information modelsand standard compliance. We described also future research activities aimed at evaluating the method in a SME andaddressing requirements of uniform access to heterogeneous formats and cost effectiveness.

References

Booch, G., Jacobson, I., & Rumbaugh, J. (1998). Unified Modeling Language User's Guide (Boston: Addison-Wesley)

Ceri, C., Fraternali, P., & Bangio, A. (2000) "Web modeling language (WebML): a modeling language for designing web sites".Computer Networks, Vol. 33 (1-6), 137-157

Chen-Burger, Y. H., Tate, A., & Robertson, D. (2002) "Enterprise Modelling: A Declarative Approach for FBPML". In ProceedingsEuropean Conference of Artificial Intelligence, Knowledge Management and Organisational Memories Workshop

Dublin Core Metadata Initiative (2003) Dublin Core Metadata Element Set, version 1.1: Reference descriptionhttp://www.dublincore.org

Elmasri, R., & Navathe, S.B. (2003) Fundamentals of Database Systems (Addison Wesley)

Fisher, M., & Sheth, A. (2004) "Semantic Enterprise Content Management". Practical Handbook of Internet Computing, editedby Munindar P. Singh (Baton Rouge: Chapman Hall & CRC Press)

Ginsburg, M. (2001) "Openness: The Key To Effective Intranet Document Management". In Proceedings of InternationalSymposium on Information Systems and Engineering ISE'2001, Las Vegas, USA

Gonçalves, M. A., Fox, E. A., Watson, L. T., & Kipp, N. A. (2004) "Streams, structures, spaces, scenarios, societies (5s): Aformal model for digital libraries". ACM Transactions on Information Systems, Vol. 22 No. 2, 270-312

Graves, M. (2001) Designing XML Databases (Prentice Hall)

Halasz, F., & Schwartz, M. (1994) "The Dexter Hypertext Reference Model". Communications of the ACM, Vol. 37(2)

Hendley, T. (2005) Managing Information and Documents. The definitive guide Cimtech Ltd http://www.doconsite.co.uk/

Karjalainen, A., Päivärinta, T., Tyrväiinen, P., & Rajala, J. (2000) "Genre-based metadata for enterprise documentmanagement". In Proceedings of the 33 the Hawai's Conference on System Sciences HICSS (Los Alamitos CA: IEEE ComputerSociety), pp. 3013-3023

Lagoze, C., & Van de Sompel, H. (2002) The Open Archives Initiative Protocol for Metadata Harvesting. Open ArchivesInitiative http://www.openarchives.org/OAI/openarchivesprotocol.html

Mitra, N. (2003). SOAP Version 1.2 Part 0: Primer. W3C Recommendation. Retrieved October 2003 from http://www.w3.org/TR/soap12-part0/

Montero, S., Díaz, P., Dodero, J. M., & Ignacio Aedo, I. (2004) "AriadneTool: A Design Toolkit for Hypermedia Applications".Journal of Digital Information, Vol. 5 No. 2 Article No. 280 http://jodi.ecs.soton.ac.uk/Articles/v05/i02/Montero/

Moore, C., & Markham, R. (2002) "Enterprise Content Management: A Comprehensive Approach for Managing UnstructuredContent". Giga Information Group, Inc. http://www.msiinet.com/html/pdfs/essecm3.pdf

Murphy, L.D. (1998) "Digital document metadata in organizations: Roles, analytical approaches, and future researchdirections". In Proceedings of the 31st Hawaii International Conference on System Sciences: Digital Documents (Los AlamitosCA: IEEE Computer Society), pp. 267-276

OASIS (2003) Extensible Access Control Markup Language (XACML), V. 1.0 http://www.oasis-open.org

Paganelli, F. (2004) A Metadata Model for Unstructured Document Management in Organizations. Phd dissertation, Departmentof Electronics and Telecommunications, University of Florence, Italy

Paganelli, F., Abou Khaled, O, Pettenati, M.C.P., & Giuli, D. (2004) "A Metadata Model for the Design and Deployment ofDocument Management Systems. In Proceedings of ICWE 2004 (Springer Verlag), pp. 589-590

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

16 of 17 9/22/2014 12:13 AM

Paganelli, F., Pettenati, M.C.P., & Giuli, D. (2005) "A Metadata-based Approach for Unstructured Document Management inOrganizations. To be published in Information Resource Management Journal (IDEAGroup)

Pras, A., & Schoenwaelder, J. (2003) RFC 3444 - On the Difference between Information Models and Data Models. The InternetEngineering Task Force http://www.faqs.org/rfcs/rfc3444.html

Päivärinta, T. (2001) A Genre-Based Approach to Developing Electronic Document Management in the Organization. PhD thesisdissertation, University of Jyvaskyla, Finland

Sall, K. B. (2002) XML Family of Specifications: A Practical Guide (Boston: Addison-Wesley Professional)

Salminen, A., Lyytikäinen, V., Tiitinen, P. (2000)"Putting documents into their work context in document analysis. InformationProcessing and Management, 36(4), 623-641

Sprague, R. H. Jr. (1995) "Electronic document management: Challenges and opportunities for information systemsmanagers". MIS Quarterly, 19(1), 29-50 http://www.cba.hawaii.edu/sprague/MISQ/MISQfina.htm.

Stickler, P. (2001) "Metia-a generalized metadata driven framework for the management and distribution of electronic media".In Proceedings of Dublin Core Conference 2001, pp. 235-241

Tongrungrojana, R.,& Lowe, D.(2004) "WIED: A Web Modelling Language for Modelling Architectural-Level Information Flows".Journal of Digital Information, Vol. 5 No. 2 Article No. 283 http://jodi.tamu.edu/Articles/v05/i02/Tongrungrojana/

the451(2002). "Unstructured Data Management: the elephant in the corner" last updated 2003 http://www.the451.com

Ushold,M., King, M., Moralee, S., & Zorgios, Y. (1998) "Enterprise ontology". The Knowledge Engineering Review: Special Issueon Putting Ontologies to Use, Vol. 13(1), 31-89

van der Aalst, W.M.P. (1998) "The Application of Petri Nets to Workflow Management". J. of Circuits, Systems, and Computers.Vol. 8(1), 21-66

van der Aalst, W.M.P., Kumar, A., & Verbeek, H.M.W. (2003) "Organizational Modeling in UML and XML in the context ofWorkflow Systems". In Proceedings of the 18th Annual ACM Symposium on Applied Computing (SAC 2003) edited by H.Haddad and G. Papadopoulos

Weber, M., & Kindler, E. (2002) "The petri net markup language". Advances in Petri Nets, LNCS series (Springer Verlag)

Yeong, W., Howes, T., & Kille, S. (1993). X.500 lightweight directory access protocol. IETF RFC 1487. Retrieved October 15,2002, from http://www.ietf.org/rfc/rfc1487.txt

Links

Documentum http://www.documentum.com

FileNet http://www.filenet.com

IBM Lotus Notes http://www.ibm.com

Interwoven http://www.interwoven.com

Microsoft SharePoint http://www.microsoft.com/sharepoint/default.mspx

Stellent http:///www.stellent.com

OpenCMS http://www.opencms.org

Apache Lenya http://lenya.apache.org

DSpace http://www.dspace.org

MARIAN http://www.dlib.vt.edu/products/marian.html

DSpace http://www.xinco.org/

Sun's XACML Implementation http://sourceforge.net/projects/sunxacml

Notes

1. More precisely, MARIAN is a digital library system (DLS). It is taken into accout in the context of this work because it is agood example of a system based on an information model (the 5S formal model), and because there are features in commonamong DLSs and DMSs.

Paganelli https://journals.tdl.org/jodi/index.php/jodi/rt/printerFriendly/67/149

17 of 17 9/22/2014 12:13 AM