CineGrid Exchange: A workflow-based peta-scale distributed storage platform on a high-speed network

11
Future Generation Computer Systems 27 (2011) 966–976 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs CineGrid Exchange: A workflow-based peta-scale distributed storage platform on a high-speed network Shaofeng Liu a,b,, Jurgen P. Schulze b , Laurin Herr c , Jeffrey D. Weekley d , Bing Zhu e , Natalie V. Osdol c , Dana Plepys f , Mike Wan e a Department of Computer Science and Engineering, University of California San Diego (UCSD), La Jolla, CA, USA b California Institute for Telecommunications and Information Technology (Calit2), UC San Diego, La Jolla, CA, USA c Pacific Interface, Inc., Oakland, CA, USA d MOVES Institute, Naval Postgraduate School, Monterey, CA, USA e Institute for Neural Computation, University of California San Diego, La Jolla, CA, USA f Electronic Visualization Laboratory, Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA article info Article history: Received 21 March 2010 Received in revised form 23 July 2010 Accepted 12 November 2010 Available online 21 November 2010 Keywords: CineGrid Exchange Digital archiving 4K Distributed storage IRODS abstract The Academy of Motion Picture Arts and Sciences (AMPAS) report ‘‘The Digital Dilemma’’ describes the issues caused by the rapid increase of storage requirements for long-term preservation and access of high quality digital media content. As one of the research communities focusing on very high quality digital content, CineGrid addresses these issues by building a global-scale distributed storage platform suitable for handling high quality digital media, which we call CineGrid Exchange (CX). Today, the CX connects seven universities and research laboratories in five countries, managing 400TB of storage, of which 250TB are dedicated to CineGrid. All of these sites are interconnected through a 10 Gbps dedicated optical network. The CX distributed repository holds digital motion pictures at HD, 2K and 4K resolutions, digital still images and digital audio in various formats. The goals of the CX are: (1) providing a 10 Gbps interconnected distributed platform for the CineGrid community to study digital content related issues, e.g., digital archiving, the movie production process, and network transfer/streaming protocols; (2) building a tool with which people can securely store, easily share and transfer very high definition digital content worldwide for exhibition and real-time collaboration; (3) automating digital policies through middleware and metadata management. In this publication, we introduce the architecture of the CX, resources managed by the CX and the implementation of the first series of CX management policies using the iRODS programmable middleware. We evaluate the first phase of CX platform implementation. We show that the CX has the potential to be a reliable and scalable digital management system. Published by Elsevier B.V. 1. Introduction ‘‘The Digital Dilemma’’ [1] pointed out that the rapidly increas- ing use of digital technology in the acquisition, post-production and distribution of media content not only brings significant ben- efit to the motion picture industry, as well as other stakeholders in the media ecosystems, but also raises serious issues on how to manage a large amount of resulting digital content efficiently over Corresponding author at: Department of Computer Science and Engineering, University of California San Diego (UCSD), La Jolla, CA, USA. E-mail addresses: [email protected], [email protected] (S. Liu), [email protected] (J.P. Schulze), [email protected] (L. Herr), [email protected] (J.D. Weekley), [email protected] (B. Zhu), [email protected] (N.V. Osdol), [email protected] (D. Plepys), [email protected] (M. Wan). long periods of time. Given that a single version of a movie cre- ated in ultra-high quality can fill tens of terabytes [1] of digital storage space, current technologies for preserving large amounts of data seem far behind the industry’s needs. A unique feature of digital media content is its mobility. For instance, starting from its creation in the life cycle of a movie, digital media will need to be moved from one system to another many times for post- production, cinema distribution, long-term archiving, and on- demand retrieval. Therefore, how to manage and transfer digital content efficiently becomes increasingly important. The traditional method of delivering film cans by courier was initially adapted to deliver HDD and data tapes. But with the increasing volume of data transfers required for modern media productions which are them- selves increasing distributed among team members spread around the world, physical delivery – and physical preservation – of digital media assets no longer satisfy the industry’s requirements. 0167-739X/$ – see front matter. Published by Elsevier B.V. doi:10.1016/j.future.2010.11.017

Transcript of CineGrid Exchange: A workflow-based peta-scale distributed storage platform on a high-speed network

Future Generation Computer Systems 27 (2011) 966–976

Contents lists available at ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

CineGrid Exchange: A workflow-based peta-scale distributed storage platformon a high-speed networkShaofeng Liu a,b,∗, Jurgen P. Schulze b, Laurin Herr c, Jeffrey D. Weekley d, Bing Zhu e, Natalie V. Osdol c,Dana Plepys f, Mike Wan e

a Department of Computer Science and Engineering, University of California San Diego (UCSD), La Jolla, CA, USAb California Institute for Telecommunications and Information Technology (Calit2), UC San Diego, La Jolla, CA, USAc Pacific Interface, Inc., Oakland, CA, USAd MOVES Institute, Naval Postgraduate School, Monterey, CA, USAe Institute for Neural Computation, University of California San Diego, La Jolla, CA, USAf Electronic Visualization Laboratory, Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA

a r t i c l e i n f o

Article history:Received 21 March 2010Received in revised form23 July 2010Accepted 12 November 2010Available online 21 November 2010

Keywords:CineGrid ExchangeDigital archiving4KDistributed storageIRODS

a b s t r a c t

The Academy of Motion Picture Arts and Sciences (AMPAS) report ‘‘The Digital Dilemma’’ describes theissues caused by the rapid increase of storage requirements for long-term preservation and access ofhigh quality digital media content. As one of the research communities focusing on very high qualitydigital content, CineGrid addresses these issues by building a global-scale distributed storage platformsuitable for handling high quality digital media, which we call CineGrid Exchange (CX). Today, the CXconnects seven universities and research laboratories in five countries, managing 400TB of storage, ofwhich 250TB are dedicated to CineGrid. All of these sites are interconnected through a 10 Gbps dedicatedoptical network. The CX distributed repository holds digital motion pictures at HD, 2K and 4K resolutions,digital still images and digital audio in various formats. The goals of the CX are: (1) providing a 10Gbps interconnected distributed platform for the CineGrid community to study digital content relatedissues, e.g., digital archiving, themovie production process, and network transfer/streaming protocols; (2)building a tool with which people can securely store, easily share and transfer very high definition digitalcontent worldwide for exhibition and real-time collaboration; (3) automating digital policies throughmiddleware and metadata management. In this publication, we introduce the architecture of the CX,resources managed by the CX and the implementation of the first series of CXmanagement policies usingthe iRODS programmable middleware. We evaluate the first phase of CX platform implementation. Weshow that the CX has the potential to be a reliable and scalable digital management system.

Published by Elsevier B.V.

1. Introduction

‘‘The Digital Dilemma’’ [1] pointed out that the rapidly increas-ing use of digital technology in the acquisition, post-productionand distribution of media content not only brings significant ben-efit to the motion picture industry, as well as other stakeholdersin the media ecosystems, but also raises serious issues on how tomanage a large amount of resulting digital content efficiently over

∗ Corresponding author at: Department of Computer Science and Engineering,University of California San Diego (UCSD), La Jolla, CA, USA.

E-mail addresses: [email protected], [email protected] (S. Liu),[email protected] (J.P. Schulze), [email protected] (L. Herr),[email protected] (J.D. Weekley), [email protected] (B. Zhu),[email protected] (N.V. Osdol), [email protected] (D. Plepys),[email protected] (M. Wan).

0167-739X/$ – see front matter. Published by Elsevier B.V.doi:10.1016/j.future.2010.11.017

long periods of time. Given that a single version of a movie cre-ated in ultra-high quality can fill tens of terabytes [1] of digitalstorage space, current technologies for preserving large amountsof data seem far behind the industry’s needs. A unique feature ofdigital media content is its mobility. For instance, starting fromits creation in the life cycle of a movie, digital media will needto be moved from one system to another many times for post-production, cinema distribution, long-term archiving, and on-demand retrieval. Therefore, how to manage and transfer digitalcontent efficiently becomes increasingly important. The traditionalmethod of delivering film cans by courier was initially adapted todeliver HDD and data tapes. But with the increasing volume of datatransfers required for modernmedia productions which are them-selves increasing distributed among teammembers spread aroundtheworld, physical delivery – and physical preservation – of digitalmedia assets no longer satisfy the industry’s requirements.

S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976 967

On the other hand, large amounts of long distance fiber opticalcables have been installed during the past decade, which nowmakes dedicated 1–10 Gbps fiber connections more affordable.This trend is making it increasingly practical to transfer largedigital media files between remote sites [2–4]. Furthermore, adistributed storage model has the potential of unifying resourcesaround the world to form a petabyte scale distributed storageplatform for media exchange and preservation.

CineGrid [5] is a research communitywith themission ‘‘To buildan interdisciplinary community that is focused on the research,development, and demonstration of networked collaborativetools to enable the production, use, preservation and exchangeof very high quality digital media over photonic networks’’.Members of CineGrid are a mix of post-production facilities,media arts schools, research universities, scientific laboratories,and hardware/software developers around the world connectedby up to 10 Gbps networks. Since 2005, CineGrid members haveconducted pioneering experiments in digital media productionand post-production, network streaming delivery, exhibition, andremote collaboration. These experiments created media assetsthat CineGrid members wanted to access over time, which meantthey had to be stored somewhere, managed to ensure accessand preservation, and transferred upon request among membersscattered around theworld. The first CineGrid Exchange (CX) nodesat UCSD/Calit2, UvA and Keio/DMC were established on an ad hocbasis to fulfill this requirement. But over time, as the number andsize of CX nodes has increased, and the number and variety ofdigitalmedia assets in the CXhave grown, the CineGrid communityhas seen a need to systematically combine and integrate existingCX resources in order to provide a scalable solution for the storageof digital assets.

Just as distributed rendering architectures have been adoptedby cinema post-production facilities to deliver visual effectsshots on tight schedules, the trends in cloud computing (privateand public) for virtualization of servers, storage and high-speednetwork infrastructure can be adopted for distributed digitalcontent creation, distribution, library and archiving services. TheCX is a pioneering effort to establish a global-scale networkedtest bed that can be used to experiment with pre-commercialtechnologies and prototype collaborative workflows. At the sametime, the CX fulfills CineGrid’s own requirements for secureaccess to the organization’s terabytes of digital media contentby replicating assets in geographically distributed repositoriesconnected by persistent 10 Gbps networks.

With support from AMPAS and contributions from otherCineGrid members, the CX development project was started in2009 to design and implement a multi-layer open source digitalmedia asset management system. This is the first large scale,distributed global storage implementation designed to handledigital motion picture materials at the highest quality and forthe investigation of issues related to digital preservation storagenetworks.

The three major goals of the CX are: (1) providing a ded-icated 1–10 Gbps interconnected distributed platform for theCineGrid community to study digital content related issues, e.g.,digital archiving, the movie production process, and networktransfer/streaming protocols; (2) building a tool with which peo-ple can securely store, seamlessly stream, share and transfer veryhigh quality digital content across theworld; (3) automating digitalpolicies through middleware and metadata management, so peo-ple can define flexible content ingestion, digestion, replication, anddeletion processes.

In Section 2 of this publication, we will discuss previous relatedwork; in Section 3, we will describe the three-layer architectureof the CineGrid Exchange, in Section 4, we will talk about the CXpolicies and workflows. In Section 5, we show the user interfacesin the CX, and some experimental results and experiences. Finally,we will conclude and suggest future work in Section 6.

Fig. 1a. HIPerSpace, one of theworld’s largest displaywalls in Calit2, UC San Diego.

Fig. 1b. The STARCAVE, a third-generation CAVE and virtual reality OptIPortal inCalit2, UC San Diego.

2. Related works

For a few years, people have been experimenting with super-high-definition digital content, such as 4K, and transfer thesecontent using high-speed lambda networks such as streaming 4Kvideo clips. ‘‘4K’’ is a super-high-definition motion picture format.As defined by the Digital Cinema Initiative (DCI) consortium ofHollywood studios in 2003, 4K has up to 4096 pixels per line andup to 2160 lines per frame. In our paper, 4K refers to a frame sizeof 3840 × 2160 pixels, which is exactly four times the resolutionof 1080p HDTV. 4K digital motion pictures can be captured by4K cameras and displayed on 4K projectors, or 4K digital motionpictures can be created synthetically, without a camera, using theexisting computer animation and scientific visualization tools. Inaddition to 4K projectors, the Optiportal [6] and the StarCave [7]are other examples of display devices for 4K and higher resolutiondigital content, shown in Figs. 1a and 1b.

The world’s first (compressed) 4K tele-presence was demon-strated at the iGrid Conference in 2005 [8] as shown in Fig. 2. Bothlive and pre-recorded 4K contentwas compressed at 500Mbps andstreamed in real time using NTT Network Innovations Laborato-ries JPEG 2000 hardware codec via 1 Gb IP networks, from KeioUniversity in Tokyo to iGrid 2005 in San Diego. Uncompressed 4Kstreaming typically consumes a bandwidth of 6–8 Gbps, and re-quires high-end hardware components throughout the data path;compressed streaming media may have data rates under 1 Gbps.

Those work showed the tremendous potential of using high-speed optical network connections to exchange large amounts ofdata between remote storage systems, which is superior to physi-cally moving storage devices around. However, these experimentsexemplify temporary installations that require configuration, tun-ing, testing, and in turn a great deal of effort to be repeated.

Data management software is a key component in large-scaledigital content preservation. To gain flexibilities of managingdigital content dynamically, we investigated iRODS, a piece ofevolving software that provides certain programming capabilitiesallowing us to integrate digitalmanaging policies into the softwareseamlessly, through the API provided, called ‘‘rules’’. iRODS theIntegrated Rule-Oriented Data System, is ‘‘a data grid software

968 S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976

Fig. 2. 4K tele-conference between Tokyo and San Diego using NTT JPEG2K codecin iGrid Conference 2005.

system developed by the Data Intensive Cyber Environments re-search group and collaborators. iRODS management policies (setsof assertions these communities make about their digital collec-tions) are characterized in iRODS Rules and state information. Atthe iRODS core, a Rule Engine interprets the Rules to decide howthe system responds to various requests and conditions’’. More ex-planation and examples will be given in Section 5.

3. CX architecture and hierarchy

The CineGrid Exchange project intends to create persistenthardware and network configurations enabling a reliable dis-tributed storage platform dedicated to the CineGrid community.This platform serves as a test bed that continually evolves to fit theparticipants’ research requirements.

The CX aims to logically combine physically distributed storagesystems at a number of locations, and to manage these resourcesas if they were a single storage system. Setting up a worldwidedistributed storage system involves a high level of internationalcooperation to provision storage capacity, network links andtechnical support at every participating site.

The CineGrid Exchange was first instantiated at three locationsCalit2 at the University of California San Diego, Keio UniversityResearch Institute for Digital Media and Content in Tokyo Japan,and the University of Amsterdam in The Netherlands (Fig. 3). Thethree sites are interconnected using 10 GigE links provided byCineGrid network members who are part of the Global LambdaIntegrated Facility (GLIF) and other research and educationalnetworks around the world [2,9]. Over time, more researchinstitutes have joined the CineGrid Exchange, contributing storageand human resources to CX development. By the end of 2009, asshown in Fig. 3b the CX has seven participants on three continents,interconnected by a 10 Gbps network. These participants providethe infrastructure for the CineGrid Exchange.

3.1. CX infrastructure

The CineGrid Exchange infrastructure consists of the followingcomponents.

1. Storage subsystems. A CX storage subsystem, also calleda Content Repository, can be any type of storage systemincluding LTO, RAID Array, or storage server. Equipment usedfor CX storage range from high-speed Sun Fire X4540 Server(Thumper) to storage arrays connected to a Linux PC. Thestorage resources at CX nodes as of Jan 2010 are shown inTable 1. The total capacity of the CX expanded from50 to 250 TBin 2009 and is still growing. Additional active sites in the USA,

Fig. 3a. The CX’s three initial sites: San Diego, Tokyo and Amsterdam, which areconnected via 10 Gbps optical networks.

Fig. 3b. By the end of 2009, the CX has included seven sites globally: Tokyo, SanDiego, Los Angeles, Chicago, Toronto, Amsterdam and Prague. They are connectedby 10 Gbps optical networks.

Brazil and Australia are planned for the first quarter of 2010.To address the need for offline storage resources, a LTO (Lineartape-open) storage/library solution is planned to provide CXlong-term bulk storage capability in 2010.

2. Network bandwidth. UsingDWDM(DenseWavelength-divisionMultiplexing) technology, 1–10 Gbps optical networks havebeen adopted by the global research community. CineGridExchange’s network infrastructure includes many high-speedadvanced network resources (CaveWave, CiscoWave, StarLight,TransLight, JGN2, etc.) that are part of the GLIF. We will discusslater in this publication whether the 10 Gbps, which are used inthe CX may be over provisioned.

3. Input/output Portal. Very high quality digital media content isoften created using specialized media devices like 4K or higherresolution cameras. Very high definition content also needsa means to be displayed. As stated in Section 2, specializedinput/output devices (4K cameras, 4K projectors driven byan NTT codec, OptiPortals or the StarCave driven by clustercomputers capable of displaying very high definition digitalcontent) can be considered an extension of CX resources andoverall infrastructure.

4. Computing resources. It is expected that eventually user of theCX will also request computational tasks: image processing,compression/decompression, transcoding, format conversion,

S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976 969

Table 1CX nodes and their capacity.

CX Node Site Storage Type CX Allocation

Calit2/UCSD, San Diego, USA Three Sun Thumpers (x4540) 66 TBUvA, Amsterdam, Holland Sun Thumper (x4540) 30 TBEVL/UCI, Chicago, USA RAID Array 10 TBDMC/Keio, Tokyo, Japan RAID Array 8 TBCSENet, Prague, Czech Republic Sun Thumper (x4540) 48 TBRyerson U, Toronto, Canada Two Sun Thumpers (x4540) 57 TBAMPAS, Los Angeles, USA Sun Thumper (x4540) 24 TB

and encoding/decoding for secure network distribution. There-fore, clusters of general-purpose computers are also critical re-sources for the CX.

5. Cluster storage. Several general-purpose PC clusters are de-ployed as part of the CX infrastructure, serving as renderingnodes for display walls or virtual reality systems with powerfulCPUs andGPUs. They canbeused as computational resources, oras distributed storage in the CX. For instance, a 30-node clusterwith 1TB hard drives in each node can be configured as a 30TBshared storage system. Special access protocols [10–13] havebeen developed to leverage these low-cost, high-performanceparallel storage systems.

All the above resources describe the CineGrid Exchange infras-tructure, which comprises the foundation layer for the CX. Theysupport the middleware layer deployed on top of this foundation.Those resources are logically independent to each other, if with-out using any upper level software to integrate them. The upperlevel software in the CX is primarily built on top of iRODS [14], in-cluding extra components customized for the CX. They form themiddleware level of the CX.

3.2. CX middleware

The CX middleware is an integration of tools and components,including distributed data grid software, metadata managementsoftware, scheduler, workflow engine, network protocols, etc.

3.2.1. iRODS: integrated rule-oriented data systemiRODS [14,15] plays a significant role in the CineGrid Exchange

middleware layer. It organizes distributed data into a shareablecollection, allowing the user to view files stored at multiplelocations as a single logical collection. In the CX, each site runsan iRODS server, which manages the local resource. iRODS serverscan exchange information directly. State information is stored ina centralized iRODS CATalog (iCAT) server. Given a request froma client, the iCAT metadata catalog is accessed to authenticatethe user and validate authorization. The location containing thedesired file is identified and the request is forwarded to the remotestorage location. A distributed rule engine located at the remotestorage location applies local policies that control functions suchas redaction, metadata extraction, replication, and retention. Oncea job is assigned to one iRODS server, an iRODS agent is spawnedlocally in response to the request. For example, if an iRODS clientin Tokyo wants to retrieve files from the storage system locatedin San Diego, the iRODS server running in San Diego will receive arequest and spawn a new instance of an iRODSAgent to handle thatrequest and send back the files to the requesting client in Tokyo.

One key feature of iRODS is its programmability, which isimplemented through the iRODS API as rules. Rules in iRODS arecritical to CX because it makes CX more than just a decentralizedFTP server, but also an intelligent distributed storage platform. (SeeSection 4 for further details).

3.2.2. CX catalog and metadata managementToday, indexing digital content poses many challenges, as

extracting information from images or videos usually requireshuman intervention. Often, the availability of metadata is the onlyoption to index, search and manage digital content. In the CX,iRODS uses a centralized database (iCAT) to manage file metadata,indexing the location, the directory structure, file names, creationdates, checksums, etc. Through the ‘‘imeta’’ command, iRODSprovides the option to associate additional, user-definedmetadatacalled Attribute-Value-Unit (AVU) metadata with individual filesand directories inside the iRODS to support metadata searches.

However, the CX’s requirements on metadata-based searchesgo beyond what iRODS can offer. Therefore, we intend to integratea more powerful metadata management tool within the CX calledCollectiveAccess [16]. CollectiveAccess is an open source softwarethat provides a highly configurable cataloguing tool and a web-based user interface for museums, digital archives, and digitalcollections. As part of the CX project in 2009, CollectiveAccess hasbeen enhanced to handle cinema and other high quality mediaassets. And an interface between CollectiveAccess and iRODS hasbeen designed so that CX will be able to use CollectiveAccess asthe ‘‘front-end’’ of its iRODS-managed distributed repository in thenext phase of implementation of the CX.

3.2.3. CX management componentsWhen iRODS and CollectiveAccess are integrated in 2010,

CineGrid Exchange development will have covered many of thebasic goals of a distributed storage platform for high quality digitalmedia. But there are still unaddressed challenges. These challengesrequire addition management components to be added to themiddleware layer of the CX. We can implement these componentsusing the iRODS programmable interfaces, and improve the overallsystem’s intelligence and configurability. Although implementingall of these sophisticated rule-based components is not the goalof the first phase, it is very important that preliminary steps arecompatible with future upgrades. Generally, we start with simpleimplementations, and will refine them as we move forward.

3.2.3.1. Policy and workflow management. The implementation ofpolicy management component is an important factor in digitalcontentmanagement. It determines theworkflows of how the dataare ingested, distributed, archived, etc., as well as ensuring thedigital content processes are complete and no steps are missed.The current CX policies and workflows are somewhat flexible asthe system requirements are evolving. The first set of policies havebeen implemented and deployed, and we will discuss them indetail in Section 4.

3.2.3.2. Resource management. Resources in the CX are heteroge-neous and will likely always remain so. As a result, we will need aresource management component to match resources to requests.Finding the right resources for a particular request not only re-quires the full knowledge of all resource capacities and perfor-mances, but also the understanding of how they are connected tothe CX infrastructure and to the requesters. We expect that re-source management will continue to require a great deal of carefulthought and planning.

970 S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976

Fig. 4. The CX Architecture.

3.2.3.3. Other management components. Some of the manage-ment modules presently under consideration are: a job sched-uler to schedule parallel jobs; fast parallel file transfer/streamingprotocols to utilize network bandwidth most efficiently; com-putational components to perform compression/decompression,encoding/decoding and transcoding; a way to monitor the dis-tributed system for overall reliability and fault tolerance, etc.Reporting/auditing has also been identified by the user communityas important.

At present, basic policies and workflows have been imple-mented to ensure the CX operability. Future research will aim toimprove and expand the current rudimentary implementations.Where it makes sense to add middleware automata, functionscurrently managed by people will be automated. Because the CXis intended to also serve as an experimental test bed for CineGridmembers, it is also anticipated that external modules may need tobe added in the future.

The CX Middleware is largely based on iRODS, and will useCollectiveAccess as a front-end to iRODS metadata managementscheme to allow more complete metadata indexing and user-friendly searches. The iRODS rule engine provides the program-ming interface for CX policies to support CX applications. TheCollective Access application provides the user interface for mediacatalog browsing.

3.3. CX applications: ingest, distribution, streaming, real-time pro-cessing, search, etc.

From the very beginning, the CXhas been driven by applicationssuch as ultra-high quality video streaming. As mentioned before,previous demonstrations were successful, but were arduous andtemporary. And once the demonstrations were over, the createdinfrastructure was taken down. A central goal of the CX isto support applications consistently by defining workflows andproviding management functionality on a high reliable platform.

In summary, the CX is designed as a three-level hierarchicalstructure, as illustrated in Fig. 4. At the lowest level is the infras-tructure, network and storage resources linked together by lambdanetworks; the middle level is iRODS, metadata management tools(CollectiveAccess) and other management components that man-age the content repositories and implement policies to support CXapplications, which represent the top level in this model.

4. CX policies and workflows

The CineGrid Exchange Working Group defines policies whichare then expressed as workflows to be implemented using

iRODS rules. From the storage platform perspective, supportingworkflows as part of the storage system functionality is fairlynovel, compared to a traditional storage system which mainlysupports POSIX I/O operations like fopen, fread, fwrite, fclose, andfseek. The definition of the workflows in the CX is largely based onreal-world, digital content workflows. Although most workflowsstill need human assistance, the goal of the CX is to make them asautomatic as possible. Automation reduces the chance for humanerror or oversight and lessens the administrative burden of archivemaintenance.

Content in the CX repository must be securely stored anddistributed, and accurately retrieved. And once accepted into theCX collections, content should bepreserveduntil a decision ismadeto purge from the collection. In addition, asset use is constrainedby copyright considerations. Permission to use the CX contentvaries widely. Though CX assets are limited to use by the CineGridmembership only, content is also classified for use based on themember type. In some cases, an additional layer of constraint isimposed based on Contributor restrictions for the intended use ofthe content.

It is widely accepted that metadata is also critical to digitalcontent preservation due to the technical difficulties in automaticindexing of digital content; however, people are reluctant to createmetadata manually, compared to their enthusiasm in creating thecontent. So, it is critical to incorporate a workflow that mandatesmetadata generation at the time of content ingest into the CX.Cataloguing, searching and referencing digital media contentwithout sufficient metadata is near to impossible, and far moretaxing than simply storing or transferring files within a storagesystem.

Therefore, the CX policies are made to guarantee the safety,security, accuracy and manageability of digital media assets in theCX collection while making them easily accessible to communitymembers. While these policies may seem straightforward at firstglance, it was not easy to define them, especially because we wantsufficient details to automate using iRODS-based rules. The contentshould be stored as multiple copies in a distributed manner, sothat even if one site experiences a catastrophic failure, thereare still identical copies around elsewhere, a digital preservationstrategy first explored in the LOCKSS Program [17]. Periodical dataverification is required to ensure that the archived content has notchanged. Our policies avoid the accidental deletion of content byimplementing a strict deletion protocol. Aftermany refinements bythe CineGrid Exchange Working Group, the first set of CX policieshas now been defined. These policies are the most commonly usedprocesses in the operation of the CineGrid Exchange. Two of themost important workflows are examined here.

4.1. CX ingest workflow

To put content into the CX, the content holder first submits anapplication to the (human) CX curator with a preliminary descrip-tion of the material, including proxy and resource requirements.The CX curator reviews the application and responds after check-ing related information, e.g., usage permission and ownership. Ifthe curator decides that it is okay to include the material, the con-tent holder completes metadata entry into the cataloguing systemvia a web form. If the content holder has a fast network connectionto the CX, or operates a CX node of their own, they are ready to pro-ceed. Otherwise, they must first ship their content on a hard driveto one of the established CX nodes and connect the hard drive tothe CX. This requirement stems from the fact that the CX residesnot on the public Internet, but is part of a limited access researchnetwork.

For security reasons, CX node operators are not allowed directaccess to the CX repositories. Each node operator is assigned

S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976 971

Fig. 5. The CX ingest workflow (the portion of the workflow with a dark background is automated using the CX middleware).

a dropBox under his home directory within the CX, which isalso managed by the CX middleware, but with appropriate user-level permissions. The content will be transferred into the nodeoperator’s home dropBox, and will be checksummed. If necessary,files with mismatched checksums will be retransferred until allchecksum tests pass. At this time, automated iRODS processestake over. After the upload is completed, an internal workflow isactivated through iRODS rules, which works like a system call in aUNIX/Linux system. The workflow will do the following things inthis sequence:

1. Switch the mode of the operator to super-user of the CX2. Use the resource-matching algorithm to identify two additional

storage resources from all resources in the CX. The resource-matching algorithm in the CX randomly selects two moreresources from the resource collection, which includes all theavailable resources in the CX. More advanced algorithms areunder development, which will consider other variables suchas remaining storage space, storage quota, and other featuressupported by the latest release of iRODS.

3. Replicate two copies to the two identified resources and per-form a checksum operation (It has been extensively discussedin the digital preservation community, and people agree that 3-copy is a mind-peace number that should be preserved for thedigital contents.) Files are sequentially copied to three locations.With RBUDP integrated into our implementation, we have notfelt this to be amajor bottleneck. However, for a long-term goal,we will consider to pipeline the process by overlapping multi-ple file transfers for the same sequential of files and checksumoperations.

4. Move the files from the content holder’s home dropBox into theuser’s corresponding directory in the repository

5. Complete the process and register the files into the CX database6. For all copies of each file, set the right access permissions7. A notification email is sent to the CX curator and content holder

confirming successful ingest into the CX.

Guided by the content deletion policy, a non-delete rulewill au-tomatically be associated with the new directory so that it cannotbe removed through any unauthorized interface or commands. By

972 S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976

the content preservation policy, a periodical auditing rule is asso-ciated with the directory to do a bimonthly auditing of the files inthe directory.

Fig. 5 illustrates the ingest workflow. The first few stepsrequire human assistance, but can be partially automated throughemail/web applications; the portion of the workflow with a darkbackground is automated using the CX middleware.

4.2. CX distribution workflow

At present, to retrieve protected content from the CX, therequester contacts the CX curator with their requests, specifyingthe name of the data object, data type (compressed version,uncompressed, with/without sound, etc.), purpose of use, etc. Thecurator will evaluate the user’s CineGrid membership status to seeif he is in good standing with a signed usage agreement. Thoughpresently these steps are handled manually, a fully automatedworkflow is planned in the near term and will be accomplishedthrough a combination of web applications and CX middleware.

Following an initial request for content, the metadata manage-ment interface of the CX will be searched to identify assets for dis-tribution. The availability of the requested contentwill be returnedfrom the search. A resource management module in the CX willidentify the best resource to fetch the content, set the parameterand start the transfer using either TCP or UDP. All fetched files willhave their checksums calculated and compared to the source auto-matically after the transfer. If the requester does not have networkaccess to a CineGrid node, an external hard drive will be used todistribute the content.

Once the workflow is finished, the curator and requester willreceive confirmation emails from the system, and the CX curatorwill follow up with the requester to confirm that the content iscorrectly received (see Fig. 6).

The design goals of the application workflows in the CX are tomake it easier for applications running on top of the CX. Whilethe CX development to date has focused on implementing basicrepository functions such as ingest, distribution, preservation anddeletion, it is anticipated that more complex workflows can beimplemented using the current platform with additional softwareefforts.

For example, a content creator (A) will create a clip that anotherCineGrid member (B) wants to view. However, suppose that Bdoes not have enough storage space to hold the uncompressedcontent and can only view compressed 4K clips, but A does nothave the equipment to compress the content. If both A and B areconnected to the CX platform, an application workflow can resolvethis problem by leveraging other available CineGrid resources.After A created the content, the content is ingested into the CX.Assuming user C has a codec that can compress the content, Ccan retrieve the content and compresses it and ingest it to the CXagain. B can now stream the compressed content from the CX tohis 4K projectors using a streaming protocol in the CX, viewing thecontent in real time.

This application example uses various components of the basicingest, distribution workflows. More complicated workflows willclearly evolve as application requirements expanded and furtherfunctional requirements are explored. Demands faced by thedigital motion picture industry will be a driver for further researchand development in this area.

4.3. Workflow description and implementation

The workflows in the CX are described using iRODS ruledescription languages and are implemented as micro-servicemodules within iRODS using a high level C programming language.The iRODS rules are scripts that will be explained by the iRODS

engine running as a server in the central server in the CX. Addinga new rule using existing micro-services will be immediatelyeffective without recompiling or restarting any of the iRODSservers. Only administrators of the CX can add/remove/modifyrules in the CX. If existing micro-services are not sufficient, userscan also program new micro-services using iRODS micro-serviceAPIs. This is also a privilege of the CX system administrators andwill require recompiling and restart the CX system.

In general, the rules are declared in the singular iCAT server inthe CX, and the execution of workflows can be either centralizedor distributed. This depends on the APIs provided by the iRODSsystem and how the rules and micro-services are implemented bythe users. In the current CX implementation, all rules are executedin one centralized place—the iCAT server in the CX. However,multiple rules can run in parallel. For example, the ingest flows canbe invoked bymultiple users at the same time, and the file transferscan be performed simultaneously with separate iRODS agents.

5. Experiments

The first phase of the CX is online now and is accessibleto CineGrid members as a distributed storage platform, whichmanages all CineGrid assets and provides support for distributedcontent access. Tens of terabytes of digital contents have beeningested through CX interfaces. The total size of the contents inthe CX is about 50TB now, and is growing rapidly. In addition toits practical function, the CX is also used as a large-scale test bedfor research purposes.

In this section, we will experiment and evaluate the CX in twomajor aspects: basic functions and workflow experiences.

5.1. CX functionalities

CX is a distributed storage system, which can be used to put/getfiles. Fig. 7 shows a scenario where the system administrator, nodeoperators or the CX curator can use the iRODS command lineinterface to navigate current CX directories, put files into the CXfor retrieval, etc. The figure also shows that the CX platform existsin parallel with a local file system, where each can freely accessed.The convenience provided by this approach is easier to use than atraditional FTP server.

Fig. 8 is the browser interface, which can be accessed fromany browser with Internet access. Files stored in the CX canbe previewed (assuming there is a reduced bitrate proxy orthumbnail) through this interface. For example, there are manyversions with the same title, and it is often difficult for a systemadministrator to readily identify files. The browser interface canfacilitate communicationwith the content creator or other systemsadministrators without the need of cumbersome downloading andconverting files, emailing large attachments, or verbally describingthe content in question.

As a distributed platform, transfer speed between many sitesis of critical importance. CX data transfer operations are donethrough its file transfer protocols, integrated in iRODS. Initially,iRODS only supported the TCP protocol; UDPwas added into iRODSas part of early CX development efforts to boost the performance.Better protocols to further utilize 10 Gbps bandwidth connectingmost CX nodes are being explored. Table 2 summarizes the currentdata transfer speed in the CX among the three major sites.

A 10 Gbps interconnection is typical for the CX. However,this high bandwidth network seems over provisioned based oncurrent available network protocols, because none of the currentprotocols in the CX can entirely fill the 10 Gbps pipes. For TCP,due to the high Round Trip Time (RTT) value, even iRODS using16 parallel TCP threads, the throughput is still only <2% of thecapacity of the network bandwidth. Our UDP protocol is derived

S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976 973

Fig. 6. The CX distribution workflow (the portion of the workflow with a dark background is automated using the CX middleware).

Table 2The CX File Transfer Performance.

Network Bandwidth 10 Gbps 10 Gbps 10 Gbps

Source Tokyo, JP San Diego, US Amsterdam, NLDestination San Diego, US Amsterdam, NL Tokyo, JPRead Speed ∼4.9 Gbps ∼5.4 Gbps ∼5.3 GbpsWrite Speed ∼3.8 Gbps ∼3.8 Gbps ∼3.6 GbpsIperf test (single process) ∼6.4 Gbps ∼6 Gbps ∼6 GbpsDisk-2-Disk TCP (1 stream) ∼160 Mbps ∼150 Mbps ∼150 MbpsDisk-2-Disk UDP (1 stream) ∼1.0 Gbps ∼1.0 Gbps ∼0.9 Gbps

from single-threaded RBDUP [18], and can achieve around 10%of the maximum bandwidth. On the other hand, we can improvethe performance of network bandwidth utilization by executing

parallel tasks. We have experimented on the CX platform withparallel data transfer between the CX sites. Fig. 9 shows oneof them, which transfers a series of 1 GB digital files from San

974 S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976

Fig. 7. The CX access commands interfaces (through iRODS).

Fig. 8. The CX access web interface (through iRODS) 4K images, for example a GLIFmap, can be retrieved through the interface.

400

350

300

250

200

150

100

50

0

Ove

rall

data

tran

sfer

spe

ed(M

Byt

es /s

)

0 5 10 15 20 25Number of parallel transfers

Fig. 9. Parallel transfer throughput from San Diego to Tokyo. (x-axis: number ofparallel transfer; y-axis: throughput in Mbytes/s).

Diego to Tokyo via the 10 Gbps optical networks in the CX. Themeasurements are the overall throughput of all parallel transfersbased on UDP and TCP respectively, and are plotted as we increasethe number of parallel transfers. With parallel transfers, TCPthroughput increases linearly and reaches a maximum of around240 Mbytes/s with 14 parallel streams or more. UDP reaches itsmaximumof 360Mbytes/swith 3–4 parallel streams, and degradessignificantly with more parallel streams, caused by a higher UDPpacket loss rate. The bottlenecks are protocols, disk I/O speed,memory copies, etc.

Therefore, in the CX, the way to really fill a 10 Gbps pipe is touse a parallel data transfer protocol to transfer data from multiplenodes to multiple nodes. Some related work has been done onscalable parallel transfer protocols [11,19,20] and has obtainedpreliminary results, but they have not been integrated the protocolinto the CX yet.

Nevertheless, even using the current CX implementations,transferring terabytes of data from Japan to USA only takes a fewhours. In our past experience, the CX platform has been used byCineGrid members to move around a large amount of critical datatomeet urgent deadlines, whichwould not be possible in any otherway. The relatively high file transfer speed also is a good foundationfor digital content workflows.

5.2. CX policy implementation and evaluation

The CX uses iRODS rules for the major part of CX workflows. Arule in iRODS defines a series of functions that will be activatedat a certain point in time. All rules are listed in a few text fileslocated at each storage server in the CX. The rule engine runningat the same server will sequentially examine all the rules. Once amatch is found, the rule engine will activate the rule. The structureof a rule is ‘‘ruleName/conditions/operations/recoveries’’. Whenactivated according to ruleName, the rulewill first check conditions,and if ‘‘true’’, it will execute the operations. The recoveries areused to handle exceptions during the execution of operations. Theoperations and recoveries of the rule are programmable usingthe micro-service interface of iRODS. A workflow may consistof multiple rules, and each rule may consist of more than oneoperations/recoveries.

For example, the three rules associated with the CX ingestworkflow are shown as below.

acPostProcForPut|$objPath like/home/user/dropBox/*|msiDataObjRepl($objPath, CineGrpResc,*st) ##msiDataObjRepl($objPath, CineGrpResc, *st)##msiDataObjAutoMove($objPath,/home/user/dropBox,/CinegridContent/user/Content, CineCurator,true)##|nop##nop##nop

acDataDeletePolicy|$objPath like/CinegridContent/usr/Content*|msiDeleteDisallowed|nop

acDigitalPreserve||delayExec(< ET > 6 < /ET >< EF > 60d </EF >),msiAutoReplicateService(/CinegridContent, true, 3,CineGrpResc, null, nop)|nop

The first rule ‘‘acPostProcForPut’’ is the post-process rule andwill be activated upon the completion of ingesting files/directories.Once activated, it will first check the path of the file/directory, andif it is in the node operator’s dropBox, the rule will call a micro-service ‘‘msiDataObjAutoMove’’, included in recent iRODS releases,to move the files into the repository. Following that we made tworeplicas to other sites in the resource group ‘‘CineGrpResc’’, bycalling another micro-service ‘‘msiDataObjRepl’’, which will auto-matically find a suitable available resource based on the resourcemanagement modules. The second rule disables the deletion per-mission from any user interface using the ‘‘msiDeleteDisallowed’’micro-service and ties to the ‘‘deletion workflow’’ which is not de-

S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976 975

Table 3Checksum overhead in ingest workflow.

Ingest Content Sequential 50 MB files Sequential 1 GB filesSD → LA SD → Prague SD → LA SD → Prague

Replication Cost 75% 74% 50% 42%Checksum Cost 25% 26% 50% 58%

scribed here. The third rule invokes a delayed execution for doingcontent auditing in a two-month cycle and ties to the ‘‘preservationworkflow’’ which is not described here. This micro-service guaran-tees the integrity of the three replicas by recovering a failed replicaor making one more replica to another available storage resourcein the resource group. Finally, it will mark the failed replica ‘‘obso-lete’’ in the database.

Since the deployment of the CX workflows in September 2009,a few terabytes of digital content have been ingested throughthe defined workflows. Our experiences with the ingest workflowsuggests checksums may be a potential bottleneck. Table 3 showssome results of checksum overhead compared to replicationoverhead. When ingesting 50 MB image files, the overhead ofchecksum is about 25%; but when ingesting 1 GB files, thechecksum overhead will be more than 50%. This is because thetransfer protocol will be faster when the file size increases, whilethe checksum will be slower. This means that checksum overheadis substantial. It is expected that the ingestworkflow can be revisedto postpone the checksum operation to the auditing workflow, sothat ingest workflow will finish much sooner.

For the distribution process, we have tested it by distributingseveral clips to a few CineGrid members. Groups at NPS,AMPAS, and SDSC are working on the metadata management(CollectiveAccess) and on the integration of CollectiveAccess withiRODS. The full distribution workflow will be online when this iscompleted.

6. Conclusion and future work

After the first phase of design and implementation, the CXhas become an online distributed platform with three hierarchicallayers. At its infrastructure layer, the physical resources have250TB raw high-speed disk capacity at seven global major siteswith 10 Gbps interconnections. At the middleware layer, iRODSmanages all the different types of storage resources, provideseasy file access commands and programmable rules to support CXpolicies. At the application level, the CX has implemented the firstset of workflows for content ingestion, distribution, preservationand deletion. The structure of the CX is completed and readyfor further developments. The first three goals of the CX havebeen accomplished: large amounts of digital data transfers areoccurring regularly and frequently between CX sites using TCP orUDPprotocols; a variety of research experiments are ongoing usingthe CX; and the first set of workflows are functioning well withlimited administrative support.

In the next phase of the CX refinement and development, wewill address somemore challenging functions. We will work moreon resourcemanagement and scheduling; job scheduling for paral-lel access; streaming protocols to send digital content from the CXdirectly to players. The current resource management scheme willbe improved to find the right resource for the right application, au-tomatic resource selection based on bandwidth/disk capacity/diskspeed and geographical information. A distributed database is re-quired for scalability and reliability reasons. More intelligent andreliable workflows, which are required to meet the request of dig-ital applications, will be defined and evaluated. And further work

towards integrating cluster resources, parallel transfer protocols,parallel storage access for content encoding/decoding as part of theCX will be pursued.

Acknowledgements

This work is a joint effort of The Academy ofMotion Picture Artsand Sciences (AMPAS), San Diego Supercomputer Center, Calit2at UCSD, Naval Postgraduate School in Monterey, CA, EVL at theUniversity of Illinois in Chicago, Keio University in Japan, CESNETin Czech Republic, UVA in the Netherlands, Ryerson Universityin Canada, University of Washington Research Channel, PacificInterface in Oakland, CA, and many other CineGrid partners.

References

[1] ‘‘The Digital Dilemma’’, The Science and Technology Council of the Academyof Motion Picture Arts and Sciences, AMPAS, Los Angeles, CA, February, 2008.

[2] GLIF, http://www.glif.org.[3] G. Pieper, T.A. DeFanti, Q. Liu, M. Katz, P. Papadopoulos, J. Keefe, G. Hidley, G.

Dawe, I. Kaufman, B. Glogowski, K. Doerr, J.P. Schulze, F. Kuester, P. Otto, R.Rao, L. Smarr, J. Leigh, L. Renambot, A. Verlo, L. Long, M. Brown, D. Sandin, V.Vishwanath, R. Kooima, J. Girado, B. Jeong, Visualizing Science: The OptIPuterProject, SciDAC Review, Issue 12, Spring 2009, IOP Publishing in associationwith Argonne National Laboratory, for the US Department of Energy, Office ofScience, pp. 32–41.

[4] Larry Smarr, The OptIPuter and its applications, in: 2009 IEEE LEOS SummerTopicals Meeting on Future Global Networks, July 22, 2009, pp. 151–152,doi:10.1109/LEOSST.2009.5226201.

[5] CineGrid, http://www.cinegrid.org.[6] Thomas A. DeFanti, Jason Leigh, Luc Renambot, Byungil Jeong, Larry L. Smarr,

et al., The OptIPortal, a scalable visualization, storage, and computing interfacedevice for the OptiPuter, Future Generation Computer Systems 25 (2) (2009)114–123. Elsevier, February.

[7] Thomas A. DeFanti, Gregory Dawe, Daniel J. Sandin, Jurgen P. Schulze, PeterOtto, Javier Girado, Falko Kuester, Larry Smarr, Ramesh Ral, The STARCAVE, athird-generation CAVE and virtual reality OptIPortal, The International Journalof FGCS (ISSN: 0167-739X) 25 (2) (2009) 169–178.

[8] Herr Laurin, et al., International Real-time Streaming of 4KDigital Cinema, Demonstration in iGrid Conference, 2005,http://www.igrid2005.org/program/applications/videoservices_rtvideo.html,etc.

[9] NLR, National LambdaRail, http://www.nlr.net.[10] J. Leigh, L. Renambot, A. Johnson, R. Jagodic, H. Hur, E. Hofer, D. Lee, Scalable

adaptive graphics middleware for visualization streaming and collaborationin ultra resolution display environments, in: Proceedings of the Workshop onUltrascale Visualization, 2008, UltraVis 2008, Austin, TX, November, 2008.

[11] Liu Shaofeng, Jurgen P. Schulze, Thomas A. Defanti, Synchronizing parallel datastreams via cross-stream coding, nas, in: 2009 IEEE International Conferenceon Networking, Architecture, and Storage, 2009, pp. 333–340.

[12] V. Vishwanath, J. Leigh, E. He, M.D. Brown, L. Long, L. Renambot, A. Verlo,X. Wang, T.A. DeFanti, Wide-area experiments with LambdaStream overdedicated high-bandwidth networks, in: IEEE INFOCOM, 2006.

[13] H. Xia, A.A. Chien, RobuSTore: a distributed storage architecture with robustand high performance, in: Proceedings of ACM/IEEE International Conferenceon High Performance Computing and Communications (SC’07), November,2007.

[14] A. Rajasekar, R. Moore, C-Y. Hou, L. Christopher, R. Marciano, A. de Torcy,M. Wan, W. Schroeder, S-Y. Chen, L. Gilbert, P. Tooby, B. Zhu, iRODS Primer:Integrated Rule-Oriented Data Systems, Morgan-Claypool Publishers, SanRafael, CA, ISBN: 9781608453337, 2010, January.

[15] iRODS: http://www.irods.org.[16] CollectiveAccess, http://www.collectiveaccess.org/.[17] LOCKSS Program, http://lockss.stanford.edu/lockss/Home.[18] E. He, J. Leigh, O. Yu, T.A. DeFanti, Reliable blast UDP: predictable high

performance bulk dataTransfer, in: IEEE Cluster Computing 2002, Chicago,Illinois, Sept, 2002.

[19] V. Vishwanath, T. Shimizu, M. Takizawa, K. Obana, J. Leigh, Towards terabit/ssystems: performance evaluation of multi-rail systems, in: Proceedings ofSupercomputing 2007, SC07, Reno, NV.

[20] V. Vishwanath, J. Leigh, T. Shimizu, S. Nam, L. Renambot, H. Takahashi, M.Takizawa, O. Kamatani, The rails toolkit (RTK)-enabling end-system topology-aware high end computing, in: The 4th IEEE International Conference on e-Science, December, 2008, pp. 7–12.

976 S. Liu et al. / Future Generation Computer Systems 27 (2011) 966–976

Shaofeng Liu gained Ph.D. in the Computer Science andEngineering Department at the University of California,San Diego. His research interests include networkingprotocols, Distributed System, Job schedule, etc. Shaofengholds a B.S degree and an M.S. degree from TsinghuaUniversity in Beijing, China. He has worked for IBMCompany for two years in Beijing.

Jurgen P. Schulze is a Research Scientist at the CaliforniaInstitute for Telecommunications and Information Tech-nology, and a Lecturer in the computer science departmentat the University of California, San Diego. His researchinterests include scientific visualization in virtual envi-ronments, human–computer interaction, real-time vol-ume rendering, and graphics algorithms on programmablegraphics hardware. He holds an M.S. degree from the Uni-versity of Massachusetts and a Ph.D. from the Universityof Stuttgart, Germany. After his graduation he spent twoyears as a post-doctoral researcher in the Computer Sci-

ence Department at Brown University.

Laurin Herr is founder and president of Pacific InterfaceInc., an international consulting company that facilitatesresearch and business between Japan, America and Eu-rope. For nearly 30 years, Pacific Interface has been an-alyzing trends in media, computing, video/graphics, dis-plays and networking applications on behalf of clientsworldwide wishing to explore new markets. In additionto strategic consulting and business development services,Pacific Interface provides clients a wide range of special-ized services to organize and manage research collabora-tions, technical symposia, technology showcases, and me-

dia events. From 1992 to 2004, concurrent with his activities at Pacific Interface,Herr also held senior management positions at Silicon Valley technology compa-nies SuperMac, Radius, Truevision, and Pinnacle Systems. He has also worked ex-tensively as an independent video producer/director. From1982 to 1992, hewas theofficial liaison to Japan for ACM SIGGRAPH. He was on the board of directors of theNational Computer Graphics Association (NCGA) in 1988–1989. He has served as anadvisory member of the Digital Cinema Consortium of Japan since its inception in2001. Herr is a member of SMPTE and ACM SIGGRAPH. After receiving his Bachelorof Arts degree from Cornell University in 1972, Herr studied Japanese intensively inthe U.S. and Japan, and pursued additional graduate studies at Cornell and at SophiaUniversity in Tokyo. He also holds a fifth-degree black belt in themartial art, aikido.

Jeffrey D. Weekley is a 3D modeler, programmer andmultimedia specialist working at the Naval Postgradu-ate School in Monterey, California. He works with NPS’sMOVES Institute, whose ‘‘expertise includes combat mod-eling systems, training systems, virtual environments,augmented reality, web technologies, networks, and inter-operability’’. It also ‘‘excels in agents and artificial intel-ligence, human–computer interaction and human factors,education and distance learning’’.

Bing Zhu Since 1999, Dr. Bing has been working on many research and softwaredevelopment projects in data intensive computing areas funded by governmentagencies such as the National Science Foundation, and the National Archives andRecords Administration. Bing’s current research interests include digital archiveformats, distributed storage, long-term digital preservation, and policy-drivensystems.

Natalie V. Osdol Pacific Interface, Inc. (PII), is an interna-tional consulting company facilitating research and busi-ness between Japan, America and Europe. Affiliated withPII since 1982, Natalie Van Osdol hasmanagedmany inter-national events and conferences produced by PII, includ-ing technical workshops, digital cinema symposia, tech-nology demonstrations, exhibitions at international tradeshows, press events and two museum exhibitions of com-puter graphics art in Japan. She produced the first US andEuropean demonstrations of 4K digital cinema andwas as-sociate producer of theVisualization: State of theArt series

of video reports published by ACM/SIGGRAPH. In collaboration with NTT Corpora-tion and the Whitney Museum of American Art, she was also the producer of TheAmerican Century: A Director’s Preview, the first multimedia showcase of fine artusing super-high-definition (SHD) imaging technology. VanOsdolwas also a found-ing partner of Compression Technologies, Inc., a company dedicated to the devel-opment and licensing of digital video compression tools. She attended Sophia Uni-versity in Tokyo, Japan, and UCLA.

Dana Plepys is responsible for administration of EVL’s ad-vanced research, as well as managing EVL’s collaborationsand technology transfer with industry and affiliated lab-oratories. She assists in the development of tools, tech-niques, and systems for scientific and artistic VR and vi-sualization applications, and the development and pro-duction of Web and video documentation of EVL researchand activities. Plepys is also responsible for EVL’s businessaffairs and the finances of grants, contracts, and internalfunding. Her joys include supervision of graduate studentsand serving as an advisor on graduate thesis committees

for the Master of Fine Arts degree. For the past two years Plepys has been the direc-tor and curator of the CineGrid Exchange. Since 1993 Plepys has been the editor ofthe SIGGRAPH Video Review (SVR), one of the world’s most widely circulated andcomprehensive video-based publications showcasing the latest concepts in com-puter graphics and interactive techniques. She has produced over 100 issues of theSVR, and is responsible for production, publication andmedia distribution. Plepys isactively involved in the initiative to preserve and digitize SVR’s historical archives(1979–2008).

MikeWan is the chief software architect of the IntegratedRule-Oriented Data System (iRODS) and the StorageResource Broker (SRB)., iRods is a follow on of SRB, whichuses rules to customize and enforce site dependent datamanagement policies. The iRODS and SRB systems arebeing used worldwide by more than 100 universities andfederal agencies.