Analysis and Description of the Semantic Content of Cell Biological Videos

Multimedia Tools and Applications, 25, 37–58, 2005c© 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands.

Analysis and Description of the Semantic Contentof Cell Biological Videos

A. RODRIGUEZ [email protected]. GUIL [email protected] Architecture Department, University of Malaga, ETSI Informatica, Campus de Teatinos,29017 Malaga, Spain

D.M. SHOTTON [email protected] Bioinformatics Laboratory, Department of Zoology, University of Oxford, South Parks Road,Oxford OX1 3PS, UK

O. TRELLES [email protected] Architecture Department, University of Malaga, ETSI Informatica, Campus de Teatinos,29017 Malaga, Spain

Abstract. We present a system for extracting and organizing semantic content metadata from biological mi-croscopy videos featuring living cells. The system uses image processing and understanding procedures to identifyobjects and events automatically within the digitized videos, thus leading to the generation of content informa-tion (semantic metadata) of high information value by automated analysis of their visual content. When suchmetadata are properly organized in a searchable database conformed with our proposed model, a content-basedvideo query and retrieval system may be developed to locate particular objects, events or behaviours. Furthermorethese metadata can be applied to find hidden semantic relationships such as correlations between behaviouralalterations and changes in environmental conditions of cells. The suitability and functionality of the proposedmetadata organizational model is demonstrated by the automated analysis of five different types of biologicalexperiments, recording epithelial wound healing, bacterial multiplication, the rotations of tethered bacteria, andthe swimming of motile bacteria and of human sperm. We have named our prototype analytical system VAMPORT(video automated metadata production by object recognition and tracking). We conclude that the use of an au-diovisual content description standard such as MPEG7 will contribute valuably towards the interoperability of thesystem.

Keywords: video content recognition, semantic content analysis, content description, metadata database,biological videos

1. Introduction

The growth of the information culture has led to a revolution in audio-visual informationsystems, which scientific community has only recently started to address with the develop-ment of biological image databases [2, 3, 7, 20, 34]. Despite these advances, vast quantitiesof valuable information contained in such databases are lost due to time constraints and thelack of proper tools for automatic extraction of features and for semantic interpretation of

https://www.researchgate.net/publication/13076721_The_BioImage_Database_Project_Organizing_Multidimensional_Biological_Images_in_an_Object-Relational_Database?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/13076725_Video_on_the_Internet_An_Introduction_to_the_Digital_Encoding_Compression_and_Transmission_of_Moving_Image_Data?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

38 RODRIGUEZ ET AL.

their content (e.g., the recognition of spatio-temporal events involving interactions betweenspecific content items in the video). There is thus a pressing need to develop automatic pro-cedures for querying and recovering content-based information from this type of videos.From our perspective, it is essential to define a general formal framework to represent in-formation about the locations of living objects within such videos, the movements of theseobjects, and the fundamental dependency relationships among them. Establishing a codingconsensus for such biological video metadata will allow the development of new indexing,query by content and retrieval technologies.

The ability to perform efficient searches on digital moving image data requires new stan-dards for storage, cataloguing and querying videos. MPEG-2 has been developed as a formatand compression system for video broadcasting and distribution. Nowadays it is acceptedas the worldwide standard for high-quality digital video compression and transmissionthat has been fully adopted by the video industry. In contrast, current visual informationretrieval systems, based on textual information and fuzzy pattern matching, require sig-nificant improvement, as do the tools for automated feature extraction. To this end, thescientific and industrial communities are currently dedicating significant effort to the de-velopment of new standards for cataloguing and querying multimedia content, such asthe MPEG-7 standard being developed by the Motion Picture Experts Group (MPEG;http://www.mpeg.org/MPEG/starting-points.html).

The MPEG-7 standard (formally known as the Multimedia Content Description Inter-face), which focuses on the description of multimedia content, provides core technologyfor the description of audiovisual data content in multimedia environments, standardizingDescriptors, Descriptor Schemes, Coding Schemes, a Description Definition Language,and System Tools [4, 5, 17–19, 28, 29]. Descriptors are representations that define thesyntax and the semantics of each video content feature. In Section 2.4 we sketch the useof the MPEG-7 standard for describing applications of our metadata model. This MPEG7description will allow the full interoperability of our proposed system with other genericapplications and tools (e.g., content browsers, editors, searching engines, etc.) that followthe MPEG-7 standard.

As a contribution towards the development of ‘high level’ semantic descriptors for scien-tific video content, we have undertaken an inter-disciplinary analysis of a specific collectionof scientific video recordings, that represent a particular field of biological experimentationinvolving studies of the behaviours and interactions of living cells by light microscopy.Using this video collection we have identified the major types of biological video metadata,and propose methods for capturing and a model for organise them.

The subject matter of the time-lapse and real-time video recordings chosen for thisanalysis include: (a) the closure of in vitro wounds in epithelial cell monolayers under avariety of experimental conditions such as exposure to drugs; (b) the in vitro proliferation ofE. coli bacterial cells; (c) the rotational movements of flagellate Rhodobacter sphaeroidesbacteria tethered to glass coverslips using an anti-flagellin antibody, that change the directionand velocity of their flagellar rotations either spontaneously or in response to environmentalstimuli; (d) the movements of free-swimming Rhodobacter sphaeroides bacteria in responseto such flagellar rotations; and finally (e) the motility of human sperm, used to evaluate theirpotential ability to fertilize human eggs.

ANALYSIS AND DESCRIPTION OF THE SEMANTIC CONTENT 39

Detailed analyses of those videos have allowed us to identify common and specificattributes for the characters (i.e., the cells) participating in the videos. The high-level de-scriptors we propose to describe conceptual aspects of the video content are derived fromfour simple character metadata: position, size, shape and orientation angle. The behavioursof the cellular characters are defined in terms of changes in these values and their evolutionalong the time axis.

Based on these parameters, we have defined a generic and extensible data model thatprovides a coherent description of the multimedia content of such videos. These metadataare then properly organized in a searchable database that supports subsequent queries tolocate particular characters, events or behaviours, that may be correlated with changes ofenvironmental conditions occurring during the recordings.

The suitability and functionality of the proposed metadata model is demonstrated bymeans of the automatic analysis of the different experiments. This analysis has involved:(a) the identification of the discrete objects in each image sequence, and the tracking of themovement of these objects in space and time, using image and video processing techniques(b) the application of new image understanding and event analysis procedures that permitthe automatic detection of specific behaviours and interactions, (c) the development of anautomatic event annotation procedure used to populate a suitably organized video metadatadatabase, and finally (d) the accessing and querying of the resulting video metadata database.Preliminary reports of this work have been presented at recent international conferences[24, 25, 32].

2. System and methods

The proposed strategy for video analysis and content-based querying can be formulated inthe following steps (see figure 1):

Object detection. Initially, image processing is required to identify discrete objects (cells,‘characters’) in each image sequence, and to track the movement of these objects alongthe space/time axis. To this end, various visual feature extraction algorithms have beencombined and improved, in the light of detailed biological knowledge of the systems.

Image understanding and event annotation. Using the data from the trajectories of theindividual objects, we address a particular aspect of image understanding, namely the auto-matic detection of events. After event detection, an automatic event annotation procedure isused to populate the video metadata database in accordance with the organizational structureof the entity-relationship model described below.

Query formulation. The spatio-temporal attributes of the objects and events detected ordefined in the previous steps are subsequently used for the video query and retrieval system.The result of a successful query is the identification of a particular video sequence or a setof video clips from among those stored in the database.

https://www.researchgate.net/publication/221509896_Automatic_Tracking_of_Moving_Bacterial_Cells_in_Scientific_Videos?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/221509923_Automatic_Feature_extraction_in_Wound_Healing_Videos?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

40 RODRIGUEZ ET AL.

Figure 1. System overview. Initial image processing and image understanding procedures supported by biolog-ical knowledge allows the automated identification of objects and events (changes in the behavioural states ofindividuals and/or groups; interactions between content items), which may be supplemented by manual annota-tion, generating semantic metadata that are organised using a flexible and extensible data model and stored in arelational database. Subsequent queries result in the retrieval of video clips matching the search criteria.

2.1. Metadata definition

We have defined and distinguished three classes of metadata relating to images and videorecordings [33].

Ancillary metadata are simple text-based details about an image or video, concerning, forexample, the filename, title, format, subject and author, and technical details of the imageor video preparation, but which do not relate specifically and directly to the visual contentof the image or video frames. In the MPEG-7 terminology, these elements, listed in [17],describe the content from the Creation & Production, Media, and Usage viewpoints. Inmost cases, these ancillary metadata are generated by the author of the image or video, andcannot be extracted from the image content.

Structural metadata relate to attributes of images and video frames such as colour, size,shape, texture and (in the case of video) motion. These are reducible to numerical imageprimitives than, once extracted, may be employed to describe the content from the MPEG-7Structural Aspects viewpoint [17] or to locate images using the simple query by exampleparadigm “Find me any image or video frame that looks like this one.” Such metadata mayalso be used to produce a storyboard for a video (transitions, scene segmentation, etc.), or tocreate the foundation for a frame-accurate index that provides non-linear access to the video[9]. The use of such structural metadata for searching image and video libraries has in recentyears become the subject of intense research, with a large and expanding bibliography (see,for example, [6, 21, 30]), and proprietary software for this purpose is now available from anumber of companies (e.g., [10]).

Semantic metadata, resulting from intelligent manual or automated analysis of the imagesor video frames, describe the spatial positions of specific objects within images, and the

https://www.researchgate.net/publication/11506036_A_metadata_classification_schema_for_semantic_content_analysis_of_videos?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/220979517_Virage_Video_Engine?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==


spatio-temporal locations of objects and events within videos. These metadata correspondwith the MPEG-7 Conceptual Aspects viewpoint of the audiovisual content.

Of the three metadata types, we have focused our work on this latter one, which is the mostrewarding in terms of information content, since it relates directly to the spatio-temporalfeatures within a video that are of most immediate importance to our human understand ofvideo content, namely “Where, when and why is what happening to whom?” Subsequentquery by content on this kind of metadata extends the query domain from the conventionalone of textual keyword or image matching techniques to include direct interrogation ofthe spatio-temporal attributes of the objects of real interest within the video, and of theirassociated event information.

2.2. Metadata model

The core of the proposed metadata organization is the definition of three main types ofentities: Media Entities, Content Items and Events [33]. The semantic metadata describingthe video content according to this schema are stored using the data model presented infigure 2. It is worth while to highlight that for interoperability, the proposed metadata modelcould be easily ported to MPEG-7 format by using the appropriate W3C XML schema usedto encode MPEG-7 descriptors (see Section 2.4).

Media entities. Cell biological videos may be primary recordings of experimental ob-servations, consisting simply of a single shot recorded as a continuous video sequence bya single camera with a fixed field of view, or they may be more complex, consisting ofdifferent shots and scenes edited together for a particular educational or research purpose.Definition of the media entities video, segment and frame is essential for the subsequentdefinition of the content items and events contained within them, the MPEG-7 concept of avideo segment being used for flexibility, since this can be any number of contiguous videoframes, including partial and complete shots and scenes, and can also accommodate gaps(17).

While some of the parameters, for example the time-lapse ratio (i.e., the ratio betweenframe recording rate and frame playback rate), are strictly ancillary metadata that cannotbe derived from analysis of the video itself, they are included here for convenience. Supple-mentary items (Inserts, Annotations and Areas of Interest) relevant to the semantic contentdescription of videos are not discussed in this paper [but see 33]. Other important ancillarymetadata items relating to the media entities (e.g., video format, Digital Object Identi-fier [22, 23; www.doi.org], and INDECS rights metadata [26, 27; www.indecs.org], whichare not relevant to our present analyses, are also intentionally omitted from the followingdescription.

Content items. Content items include animate characters (e.g., cells) and inanimate ob-jects (e.g., micropipettes) appearing in the video frames that share many properties incommon. They may be given separate labels or recorded in separate content item tables forconvenience, and may each be organized into groups and classes. For example, individualbacterial cells may be divided into two groups, ‘tethered’ and ‘free’, both members of the


42 RODRIGUEZ ET AL.

Figure 2. The metadata entity-relationship model for semantic video metadata. The model (simplified from[33]) shows the relationships that exist between content items, events and media entities, which are colour codedgreen, purple and light blue, respectively. To preserve simplicity and clarity within this two-dimensional diagram,characters and objects are here shown together as undifferentiated ‘content items’, and certain relationships (e.g.,groups appearing in frames) have not been represented. In addition, the starts and ends of scenes have not beenshown explicitly, and the derived parameters of groups, classes and event categories are shown associated witheach of these items, rather than with their relationships to media entities, as should strictly be the case. Thenotations on the lines joining the items (0, 1 and/or N) show their entity relationships: one to one, one to many, ormany to many, with zero indicating the possibility of there being no content items, events or supplementary itemsassociated with a particular media entity.

class ‘Rhodobacter sphaeroides bacteria’. For simplicity in the subsequent text of this paper,only characters are discussed, but the metadata organization for objects is identical.

Events. An event is an instantaneous or temporally-extended action or ‘happening’ thatmay involve a single character or object (e.g., a cell contracting), an interaction betweentwo or more characters (e.g., a cytotoxic cell killing a target cell), or all the characters (e.g.,perfusion of the observation chamber by a drug). It is helpful to group events into eventcategories to aid subsequent searching.



2.3. Metadata generation

Measurement of primary spatio-temporal parameters. The primary semantic metadatarelating to individual content items are, in our analyses, generated using traditional, en-hanced or, in several cases, specific image processing techniques used to segment individualframes, from which the sequential spatial positions of the moving content items may bedetermined from frame to frame. For each content item that is correctly identified, the fol-lowing primary spatial parameters are recorded for each video frame: position, size, shapeand orientation. These values may then be used to describe the complex behaviour of thecontent items, as described below.

Calculation of derived parameters. From the positions, sizes, shapes and orientationsof all the content items in every frame (the primary spatio-temporal semantic metadata),derived semantic metadata may be determined, for example, the rate of growth of a particularcell, or the handedness of its rotation.

For motile cells, tracking procedures are applied to the primary spatio-temporal metadatain order to obtain the trajectory characteristics (motion history) of each content item, whichincludes the start and end frames of a particular track, and the instantaneous velocity fromframe to frame.

In addition to these instantaneous parameters, one can also compute average parametersover longer time periods, for example the average translational or rotational velocity, di-rection of movement or growth rate of a single cell, or the proportion of time spent in aparticular state (e.g., stationary or swimming), either for the entire duration of the recordingof a single identified character, or over a defined period (e.g., the last 50 frames). Fromthese data one can also determine the variances of these parameters.

Furthermore, from the parameters relating to individual characters or objects, derivedsemantic metadata describing the population behaviour of an entire group or class of contentitems may also be obtained. Here it is necessary to define what we understand to be theappropriate metadata in the context of a population. In most cases, they will be the averagevalues from all the component members of a group or class (mean size, average speed,etc.), together with the variance of the parameter and a record of the number of individualscomprising the group. However, in others cases alternative descriptors are appropriate. Forexample, in studying bacterial growth, the group size (i.e., the number of cells forming theentire colony) is the metric of primary importance (see below). Similarly, we may wishto record the group position, defined as the centre of gravity of the area occupied by thepopulation, and the degree of dispersion or clustering within the population. We may alsowish to compute some probabilistic information, such as the probability of occurrence of agiven event within the population.

These derived parameters, examples of which are given in figure 3 below, may be pre-defined and recorded in the metadata database, or may be computed on the fly from theprimary semantic metadata as and when they are required. We have adopted the strategyof storing the four primary spatio-temporal parameters (low-level descriptors) describedabove, for each object in every frame together with their corresponding derived parameters.In this way we expect to obtain all the advantages of maintaining low-level information

44 RODRIGUEZ ET AL.

Figure 3. Relational tables for the storage of the semantic video metadata (rotating tethered bacteria case).The Content Item Identity Tables, Events Tables, Spatio-Temporal Position Tables and Media Entity IdentityTables are shown using the same colour scheme and general layout as for the items themselves in the metadataentity-relationship model (figure 2), and the relational connections between them are shown. Metadata parameterscommon to all videos are shown in normal font, while the derived metadata parameters specific for tetheredbacterial cells are shown within the tables in italics framed by dashed boxes. For simplicity, separate contentitem identity table for objects and characters are not shown. Most ancillary metadata (e.g. author, video format,colour depth/dynamic range, and time lapse ratio) are also not shown. For different video types, different derivedsemantic metadata parameters would be placed in the various tables in place of those shown here. For a woundhealing video, for instance (see Section 3.1 below), those in the CONTENT ITEM IN FRAME table would bechanged to healing rate and wound width. In all cases that table would hold the primary spatio-temporal metadatafor position, size, shape and orientation acquired for all types of content item. PK: primary key (underlined); FK:foreign key.

(new derived parameters can be computed when necessary), and fast access and processingbased on high-level information.

It is worth emphasising that while the primary spatial parameters are generic for all typeof videos, and closely follow the proposed MPEG-7 and SMPTE descriptors, descriptorswithin these standards are still not directly available for many of the higher-level derivedbehavioural parameters based upon them (derived semantic metadata), but general semantictools are offered in the standards and these tools can be applied to represent them (seeSection 2.4).

Identification of activity states and events. Next, we address the most application depen-dent activity of the analysis, namely that concerned with identifying the activity states ofeach character, and the events in which it is involved. For this, specific biological knowledge


is required to define the activity states and events of biological relevance in each particularexperiment. In some cases, it will be important to record the movements of the object, forexample changes in velocity in response to a change in an environmental variable such astemperature, turns in otherwise linear trajectories, or transitions between swimming andstationary, or between clockwise and counter-clockwise rotation. In other analyses, it willbe important to determine the timing of significant life events such as cell division or celldeath.

The derived semantic metadata required for different types of video analyses will vary.However, the metadata model permits semantic metadata relating to novel video itemsto be added easily, so that the same database system can be used for the analysis of awide range of different types of videos [24, 25, 32]. The spatio-temporal parameters of thecharacters, objects and events thus determined, properly organized in a searchable database(see figure 3 for an instance of such database tables), allow subsequent queries to locateparticular characters or events.

As our system is designed to enable informative descriptions of specific videos, eachimplementation of the metadata model for an individual video type will be specialised.The high level semantic metadata are quite specific, and thus special attributes must beincluded in each database table to represent these case-specific object properties. In figure 3we illustrate where specific attributes are placed for one particular video type. Apart fromthese necessary specific differences in table attributes and the corresponding methods forextracting them from the videos using specific image analysis methods (described in Section3), the metadata table organization is consistent and general, and each type of metadata isstored in its own place. Thus it is easy to adapt the model to any new video type [24, 25,32]. This common organization allows a common and consistent method for querying andexploring of the content while varying the concrete attribute names.

At present, situations that involve more than one cell type, such as the infection of ahost cell by a parasite, or the induction of apoptotic cell death in a target cell by a cytotoxiceffector cell [31], are too complex for fully automated video analysis. For such situations, wehave developed a prototype alternative system named VANQUIS (video analysis and queryinterface system) which permits content analysis to be conducted in a semi-automatedmanner, using the superior pattern recognition skills of the human eye-brain system topermit the user to make the decisions, while employing the computer to speed the tasks ofmetadata annotation, database entry and subsequent query by content [15].

2.4. Metadata mapping to MPEG-7 descriptors

In the previous section (Section 2.3), we have explained our metadata model for describingthe semantic content of cell biological videos. Most desirable would be a method for inter-operability and exchange of content descriptions between our metadata database system andother applications which have their own way of dealing with them. The MPEG7 standard[17, 18] is designed for just such a task.

In this section, we wish to comment upon the mapping of our model to MPEG-7. Thismapping will allow the full interoperability of our system with other applications (contentbrowsers, editors, searching engines, etc.) that follow the MPEG-7 standard.



https://www.researchgate.net/publication/11506037_VANQUIS_a_system_for_the_interactive_semantic_content_analysis_and_spatio-temporal_query_by_content_of_videos?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/13426825_Variant_antigenic_peptide_promotes_cytotoxic_T_lymphocyte_adhesion_to_target_cells_without_cytotoxicity?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==



https://www.researchgate.net/publication/3338672_The_Generic_Multimedia_Content_Description_Standard_Part_1_IEEE_MultiMedia_09?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

46 RODRIGUEZ ET AL.

Among the description tools, descriptors (Ds) and descriptor schemes (DSs) definedin the MPEG-7 multimedia description schemes [18], our metadata model is focusedon what MPEG-7 calls Content Description Tools, composed of Structural and Semanticaspects.

The Media Entities (videos, segments, frames) of our model (figure 2) relate directly tothe VideoSegment DS, defining the temporal structure of the video in a hierarchical way. Thespatio-temporal definitions of physical objects such as wounds or bacterial cells (primaryspatio-temporal metadata defining content item position, size, shape and orientation) relatedirectly to the MPEG-7 StillRegion and MovingRegion DSs.

The semantic aspects of our model include all the semantic and conceptual notions thatrely upon objects, events, abstract notions and their relationships. Thus the object and evententities of figure 2 are described by the MPEG-7 Object and Event DSs, while the Seman-ticState DS allows one to describe and parameterise the semantic properties of an entity(bacteria, wound, event, etc.) at a given time and location. To describe entity relationships,there are a variety of semantic relation description tools: ObjectObjectRelation DS to de-scribe membership of content items within populations or groups; ObjectEventRelationDS to describe the agents and objects involved in particular events; and SegmentSeman-ticBaseRelation DS to determine the links between media segments (moving regions) andsemantic entities (objects).

Nevertheless, we believe some extensions or specializations of the MPEG-7 descriptorsare required in order to fully represent all our metadata. This can be done using the toolsfor extension offered by MPEG-7 and its DDL (description definition language). For exam-ple spatio-temporal semantic metadata (held within the Spatio-temporal Position tables infigure 3) should be represented by the SemanticState DS in which the state is not a constantvalue of a scalar data type but a sequence of values changing in time. This can be done byextending the SematicState DS to enable the holding of TimeSeriesType data types, ratherthan simple data type values only, as at present.

The precise mapping of our model to MPEG-7 is a work in progress and will become thefocus of our next endeavours.

3. Analyses and results

The application domain chosen for our demonstration of video analysis for subsequentquery by content, using the semantic metadata described in the previous section, involvesthe analysis of scientific videos of cell biological specimens.

Our intention in this section is to illustrate the general applicability of our system to avariety of cell biological videos. The analysis procedure is designed to allow the reuse ofimage analysis and feature extraction functions whenever possible (e.g., the wound healingand bacteria culture videos use the same spatial analysis technique) or easily to include newimage analysis modules when necessary to analyse different types of content. Using thisanalysis approach, and the metadata organizational model described above, five types ofbiological videos have been used as model subjects for this initial work, all derived fromrecordings of living cells using the light microscope.



3.1. Wound healing videos

The wound healing videos analysed were made using time-lapse video microscopy withphase contrast optics to record the closure of in vitro wounds made in freshly confluentmonolayers of cultured epithelial cells under a variety of experimental conditions, forexample the transient perfusion exposure of the cells with drugs that effect cytoskeletalpolymerization [12–14]. For this type of video, questions of interest relate to the rates ofwound healing, as measured by reduction of the cell free area, as healing proceeds underdifferent drug regimes, in contrast, for example, to the rate of growth of the same type ofcells at the free margins of unwounded colonies.

The automatic feature extraction procedure for this type of videos is designed to measurethe rate of wound healing by the progressive loss of open wound area (see figure 4). Tothis end, we have used computer image processing techniques to distinguish the uniformcharacter of the cell-free open wound in each frame from those regions of cellular monolayeron either side, where the image texture is quite different, being characterized by highfrequency modulations of the image intensity.

Segmentation of the uniform wound region from the rest of the image is undertakenby using a differential operator that calculates the changes in the first spatial derivative ofthe image [8]. Further histogram equalization and image thresholding allows us to excludehigh contrast zones. The resultant threshold closely follows the faint margins of the rufflingwound edge cells.

Once the image processing has been completed, an internal bounding box is used tocompute, for each video frame, the wound area per unit length of wound, and the edgelengths. This internal bounding box, which is used as a safeguard to exclude end effects,is defined as a rectangle oriented parallel with the axis of the linear wound (computed byprincipal components), which lies fully within the video frame, and which is wider than themaximum wound width. The size and position of this bounding box (see figure 4) is keptconstant for all the frames of the video that are to be analysed.

The wound area within the bounding box is then treated as a single object of interestin the identity table, and no metadata are recorded concerning the positions and shapes of

Figure 4. Stages during in vitro epithelial cell wound healing. A gallery of video frames recorded at three pointsduring one of the wound healing videos. Times are given as hh:mm:ss in the lower left corner of each frame,below the original date of the recording. The width of each image is approx. 300 µm. A clear and distinguishabledifference in texture can be observed between the uniform character of the cell-free open wound and the regionsof cellular monolayer on either side.

https://www.researchgate.net/publication/256822827_Bidimensional_shape_detection_using_an_invariant_approach?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/243664410_The_effects_of_colchicine_and_brefeldin_A_on_the_rate_of_experimental_in_vitro_epithelial_wound_healing?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/243664668_Time-lapse_video_microscopy_of_wound_healing_in_epithelial_cell_monolayers_effects_of_drugs_that_induce_microtubule_depolymerization_and_Golgi_disruption?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

48 RODRIGUEZ ET AL.

the individual cells. The change of this wound area per unit length of wound per minutepermits the rate of wound closure, and hence the effectiveness of the healing process, to becalculated [25, 32]. Significant changes in the healing rate may be related to changes in theculture environment of the cells.

3.2. Bacteria proliferation videos

The second analysis is closely similar to the first one, in that both measure the loss of cell-freearea in successive video frames. Here, the in vitro proliferation of E. coli bacterial cells isrecorded by time-lapse video microscopy using Nomarski optics (figure 5). Given favourableculture conditions, bacteria grow rapidly and undergo mitotic cell division approximatelyevery half hour, each division resulting in the generation of two daughter cells from theoriginal cell. When analysing this type of video, the objective is to determine the rates ofcell proliferation at different stages in the culture.

The automatic feature extraction procedures used in this case are very similar to thoseused to measure wound healing in the first example, clearly demonstrating the re-usabilityof image processing software designed to this end. Visual inspection of the frames to beprocessed suggested the use of a texture based algorithm to solve the problem of imagesegmentation, since the background lacks the high spatial frequency information present inthe region occupies by the growing cells. Use of first derivative information and a regiongrowing procedure used to close small gaps in the cell region, allows the identification andgrouping of non-uniform regions into a single area occupied by the cell colony. The outerperimeter of the colony defines an area that is then used as a measure that approximatesthe number of cells in the colony at each moment. Occasionally, uniform cell-free regionsare enclosed within the colony perimeter, leading to abrupt and artifactual increases in theestimated cell number. In these cases, a more precise method based on edge detection ofthe individual cells is applied. This method both identifies the edges of the population andeliminates any uniform internal regions.

Metadata from analyses of the wound healing and bacterial proliferation videos. Usingthe semantic metadata for the cell-free areas derived from the frame by frame analyses,the instantaneous rates of wound healing and of bacterial culture growth can easily be

Figure 5. Stages in the early growth of an E. coli culture. An image sequence showing successive time-pointsduring bacterial proliferation. The width of each image is approx. 20 µm, and the time interval between imagesis approximately 30 minutes.



Figure 6. Graphs showing rates of wound healing and bacterial growth. (A) Plots of the raw data obtained byfully automated analysis of changes in wound area with time for three separate wound healing experiments inwhich drug treatment was not applied. (B) The same data after filtering to remove noise caused by periodic darkframes in the original video (see [25] for details), superimposed on the results obtained in 1994 by manual analysisof the original analogue video recordings [12]. (C) The growth curve of a culture of E. coli cells obtained by fullyautomated analysis, showing an initial lag phase while the colony proliferates slowly from 1 to 500 cells, followedby a period of exponential growth from 500 cells to 17,000 cells (the so-called log phase, which gives a linear plotwhen the logarithm of the cell number is plotted against time), and ending in a gradual reduction in growth rateduring the colony’s multiplication from 17,000 to 25,000 cells, as more and more bacteria compete for dwindlingnutrient resources.

computed (figure 6), and from these the mean rates of wound healing or colony growth canbe determined over any desired time period, or in response to drug treatment.

In the wound healing study, for example, this is achieved by looking at the appropriateregister in the spatio-temporal position table, where the open wound area is recorded foreach frame in the video, with its associated derived metadata, the instantaneous woundhealing rate. Changes in this rate in response to drug administration could be determinedby searching the database for differences in the healing rate before and after the event ofdrug administration, the timing of which is recorded in the events table.

3.3. Bacterial motility videos

This set of analyses are of videos recording in real time (a) the movements of free-swimmingRhodobacter sphaeroides bacteria as their flagellar rotations change velocity or direction,and (b) the rotations of another population of the same species of bacteria that have beentethered to the glass coverslip of the microscope observation chamber using an anti-flagellinantibody, both made in the laboratory of Professor Judy Armitage of the University of Oxford[16]. For these types of video, questions of interest relate to the determination of individualand population statistics concerning run lengths and velocities, and the frequencies, du-rations and patterns of tumbles of free-swimming bacteria, and concerning the rotationaldirections and speeds of tethered bacteria, and the frequencies and patterns of their stopsand reversals, correlated with changes of environmental conditions.

The bacterial motility videos contain large numbers of ‘characters’ (the bacteria) whosemovements are independent from one another, presenting a high level of complexity for theanalysis and metadata extraction. The commercial system presently in use by the creators ofthe videos for the analysis of bacterial motility [11] has severe limitations in the number of

https://www.researchgate.net/publication/13735474_Bacterial_locomotion_and_signal_transduction?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==


50 RODRIGUEZ ET AL.

bacteria that can be simultaneously tracked, and in the extent of the data that can be analyzedand stored, both problems related to the fact that it is designed to work with limited hardwareresources in real time direct from an analogue video camera or a videotape.

In the first stage of our own off-line analyses of these videos [24, 32], an initial segmen-tation of the frame images is undertaken with due regard for the variations in backgroundillumination between frames, using a dynamic thresholding procedure [1]. Subsequently,the sizes and shapes of the individual bacteria are determined using a growing region algo-rithm, where bacterial “objects” are built from an initial seed point inside each bacterium.The orientation of the main axes of the elongated cells is determined by principal componentanalysis. For each cell, we can thus record its initial position, size, shape and orientation inspace (figure 7A).

For the videos of the free-swimming bacteria, the next step is to track the movementsof the cells (figure 7B). The tracking problem can be defined as one of recognising thesame object in consecutive frames of the video. The initial algorithm used to solve thisproblem is simple, and relies on the fact that any bacterium is likely to show a similar size,shape and orientation in adjacent frames of the video, and that its position in any frame is

Figure 7. Automated identification and tracking of swimming bacteria. In the first stage of analysis, segmentationof the images is performed to detect individual bacteria in each frame. (A) All the single bacteria in the focal planehave been successfully detected and are shown in white, outlined, and with a principle axis vector drawn. Thosebacteria that are clustered in small groups, where recognition of individual cells is difficult, and those bacteriaabove or below the focal plane, which appear as indistinct light or dark blobs, have intentionally been excludedfrom the selection. The trajectories of individual bacteria are determined. The result of this procedure applied toa single bacterium is shown in the top of image A, where the bacterium’s positions in five consecutive frames areshown superimposed upon an image of the initial frame (arrow shows direction of movement). (B) Central panelshows a typical composite plot of the trajectories of all the free-swimming bacteria in the field of view over aparticular four second period. It can be seen that some are swimming in relatively straight lines and some in tightcurves, while others are tumbling or stationary. (C) A schematic representation of the strategy used to solve theincomplete trajectory problem. The two different tracks detected, which show bacterial positions at single frameintervals, are actually sections of the same bacterial trajectory, broken when it swam out of the plane of focus. Fortruncated tracks such as these, the algorithm tries to find corresponding truncated tracks and link them. [See 24for details].



likely to be close to that in the preceding frame. Application of this algorithm results in thedetermination of bacterial trajectories from which parameters such as speed, direction andpath curvature may be calculated.

However, since the individual bacteria are swimming unrestricted in three dimensions inthe space between the microscope slide and the overlying coverslip, they may stray from thenarrow focal plane of the microscope objective lens and become temporarily lost from view.This causes them to be missed in some frames by the initial segmentation and cell recognitionalgorithms, causing fragmentation of their trajectories. Since for the scientific analysis ofbacterial movement it is important to have trajectories that are as long as possible, there is aneed to link partial or broken trajectories into longer continuous ones. This is achieved by apost-processing algorithm that checks, for every partial trajectory that ends, whether thereis another partial trajectory which is spatially adjacent, which starts within an appropriatetime interval after the ending of the first trajectory (a few frames later), and which matchesthe first one in features such as speed and direction, and the shape and size of the bacterium.If these conditions are fulfilled, the ‘two’ bacteria originally identified are recognized to bethe same cell, and the two trajectories are linked to form a longer one (see figure 7C).

For the free swimming bacteria, the important high-level events to detect are the tran-sitions between their three principle behavioural states (figure 8(A)), namely the forwardswimming state when all the bacterial flagella are rotating counter-clockwise which is

Figure 8. Behaviour states and event detection in bacterial swimming. (A) The positions of three different bacteriaare shown in four successive frames. By analysing the lineal distance covered, the directional change in trajectory,and the change in cellular orientation during a brief interval, it is possible to determine the current behaviouralstate of each bacterium. Bacterium A is swimming, bacterium B is tumbling, and bacterium C is stationary. Adegree of flexibility (percentage hysteresis) is applied to the thresholds that separate these states, in order to avoid‘noisy’ behaviours of cells close to a threshold. (B) The trajectory of a single bacterium plotting over ten seconds,in which the system has automatically detected five tumbles (marked with boxes).

52 RODRIGUEZ ET AL.

characterized by a smooth continuous curvilinear movement, the tumbling state, in whichthe clockwise rotation of the flagella cause the bacterium to gyrate randomly; and the sta-tionary state. This last state is the easiest to define, and is used to categorise all bacteriawith translational velocities less than a very low threshold value. To distinguish the for-ward swimming state from the tumbling state, we use a sliding time window of w frameslength (typically 8–10 frames), over which we compute the ratio between the total distancetravelled (i.e., the sum of all the inter-frame distances) and the overall displacement (i.e.,the linear distance between the initial and final positions). Values near to one characterizeforward swimming, while higher values represent the tumbling state.

For each identified bacterium, the system determines and stores the semantic metadatarelating to the cell’s spatio-temporal trajectory. From these primary metadata one can calcu-late derived parameters such as the instantaneous velocity, the duration, initial direction andpath curvature of the individual forward swimming trajectories, and the frequency, durationand patterns of tumbles and stops. Figure 8(B) shows a typical example of bacterial trackingwhere five tumbles have been detected, interrupting forward swimming and causing ran-dom changes in direction. The motility parameters can be correlated with details about theenvironmental conditions pertaining at the time. In addition, summary statistical metadatamay be produced describing the average motility of the whole bacterial population in thevideo.

For the rotating tethered bacteria, the task of identifying the same cell in successive videoframes is obviously more straightforward, and the salient features to record from such videosare the instantaneous speed, handedness and duration of each rotation, accelerations anddecelerations, the frequency of reversals, and the duration of stops. These metadata are thenstored in the appropriate identity, spatio-temporal position and events tables, as detailed infigure 3. Results from such analyses are not shown here, but are presented in figure 5 of[24].

Examples of queries that can be made using the semantic metadata derived from videosof the free swimming bacteria are: “Identify all the video clips containing any bacteriaswimming at mean individual velocities of at least x µm per second,” and “Find me thevideo sequences in which, after the administration of drug A, the average tumble frequencydecreases by more than 30%.” For the first query, a simple inspection of the appropriatespatio-temporal position table permits identification of the video frames containing bacteriawith individual velocities, averaged over the preceding 25 frames (1 second), above x µmper second. The second question requires a calculation of the average tumble frequency ofthe population of bacteria before and after the event of administration of drug A, determinedfrom the times of the starts of all bacterial tumbles and of drug administration recorded inthe events table.

3.4. Human sperm motility videos

Real time recordings of human sperm motility are widely used to assess the potential abilityof human semen to fertilize human eggs, there being a good correlation between spermmovement characteristics and fertilization success. Several characteristics are involved instudies of this type, including the sperm ‘count’ (i.e., the total cell density, typically required



to be greater than 20 million per ml for fertility) and the ‘motile density’ (i.e., the densityof motile sperm, typically required to be greater than 8 million motile cells per ml). Themotile density is perhaps the more important parameter, as it reflects the number of spermcapable of progressing through the female reproductive tract from the site of depositionnear the cervix to the site of fertilization in the Fallopian tube.

For this type of video recording, questions of interest relate to changes in the meanswimming velocity of the sperm, and the proportion of motile sperm, determined at hourlyintervals over a period of up to twenty four hours from receipt of the specimen. It iswell known that sperm motility is temperature dependent, so the handling and processingof specimens is critical, requiring that specimens be evaluated only in tightly controlledlaboratory conditions.

The automatic feature extraction procedures used in this case are very similar to thoseused to measure bacterial motility (Section 3.3). Both types of video contain large numberof characters that shown a variety of behaviours. For these videos, questions of interestrelate to the determination of individual and population statistics concerning run lengthsand velocities, and the frequencies, durations and patterns of circular and linear movementsmeasured by the rotational directions and linear speeds of spermatozoids.

Figure 9. Phase contrast images of human sperm. Successive tenth video frames are shown in the first threeimages, illustrating the motility of a single human sperm cell in vitro. The cell is approximately 70 µm long. Theright image, which is a superposition of seven such frames, shows the unusual circular trajectory of this particularcell, which is rotating once every 1.62 seconds.

Figure 10. Analysis of the circular trajectory of a single sperm cell (the one shown in figure 9). The left-handgraph tracks the position of the cell in each video frame over a five second period, during which it makes morethan three revolutions. On the right, we represent the cumulative angular changes for this cell. The slope of thefitted line gives the mean rotational speed for the cell. In this case, the slope is 8.91 degrees/frame, giving a meanrotational speed of 0.62 Hz.

54 RODRIGUEZ ET AL.

Here we present (due to length constraints) a single analysis of a circular trajectorymovement (figures 9 and 10). After an initial segmentation of the frame images that allowsthe identification of individual spermatozoids (position, size, shape and orientation of themain axes), a tracking procedure is applied to determine the trajectories, as is explained inSection 3.3.

4. Conclusion

In this work, we have devised a general framework for the analysis and storage of metadatadescribing the semantic content of cell biological research videos, and have employeda variety of algorithms for feature extraction from the digitalized videos. Based on theproposed metadata model, we have developed an automatic analysis and annotation systemfor this type of scientific video recording.

We have named this prototype analytical system VAMPORT (video automated metadataproduction by object recognition and tracking). It permits the generation of information ofhigh intrinsic value by intelligent analysis of the visual content of moving image data. In allcases, the results we have obtained have been validated by alternative conventional methodsof analysis, and we have observed significant increases in both accuracy and efficiency whenundertaking such analyses automatically rather than by hand.

For the studies of epithelial wound healing and bacterial growth, useful quantitativedata have been obtained simply by determining the change in the cell-free area with time.For our analysis of the motility of individual cells, we have determined primary semanticmetadata relating to four related spatial parameters: the position, size, shape and orientationof each identified cell in each video frame. These metadata descriptors closely follow theapproved recommendations of the MPEG-7 standard for representing information aboutvideo content.

From these primary spatial metadata, and their variations along the time axis of the video,we have computed a variety of high-level features related to its semantic content describedby a set of derived parameters. Furthermore, from detailed knowledge of their biology,we have been able to define the potential behavioural states of particular cells, permittingthe automated identification of transition events between these states. All of these primaryand derived semantic metadata parameters relating to individual cells can be combined oraveraged to determine group characteristics.

The semantic metadata extracted in this way are stored and organised in a relationaldatabase, over which a content-based query and retrieval prototype has been built. Sub-sequent queries on these metadata may be used to obtain important factual and analyticalinformation, and to retrieve selected video sequences matching the query criteria. As resultof a successful query, the system returns to the user a list of video files, together with detailsof the relevant frame numbers, allowing video clips matching the query to be retrieved.

As an extension of this study, the semantic metadata stored in the database may besubjected to further analysis and mining algorithms to produce more elaborate knowledge,for example the discovery of common patterns in the motility behaviour of two species ofbacteria, or the classification of a single bacterial population into two classes (e.g., normaland non-tumbling mutants). In the future, such data mining will be increasingly important


in many areas of biomedical and pharmaceutical research, but can only be truly successfuland widespread if the initial analyses and metadata encodings are all made to conform torecognised international standards such as MPEG-7, to ensure true inter-operability betweenseparate data sets and databases. Since, in the framework of MPEG-7 development, high-level semantic metadata descriptors such as we have described are still under definitionand validation, the data model and the descriptors that we propose here may be of value inrefining the emerging definition of these important standards.

Acknowledgments

We are most grateful to Professor Judy Armitage of the Bacteriology Unit, Departmentof Biochemistry, University of Oxford, and to Dr. Jim Sullivan of Cell Alive (http://www.cellsalive.com), for permitting us to use as test data for our analyses their video recordings ofbacterial motility, and of sperm motility and bacterial proliferation, respectively, and to Dr.Philippe Salembier of the Image Processing Group, Universidad Politecnica de Catalunyafor valuable comments on an early version of this manuscript.

DMS acknowledges with gratitude the role of the European Commission, as part ofits IV Framework Biotechnology Infrastructure Project BIO4-CT96-0472, in providingcomputational equipment and research funding used in part of this research. The work ofthe Computer Architecture Department (University of Malaga) has been partially supportedby grant 1FD97-0372 from the EU-FEDER Programme (Fondos Europeos de DesarrolloRegional).

References

1. D.H. Ballard and C.M. Brown, Computer Vision, Prentice Hall, 1982, pp. 143–146.2. T. Boudier and D.M. Shotton, “Video on the Internet: An introduction to the digital encoding, compression

and transmission of moving image data,” J. Struct. Biol., Vol. 125, pp. 133–155, 1999.3. J.M. Carazo and E.H.K. Stelzer, “The bioImage database project: Organizing multi-dimensional biological

images in an object-relational database,” J. Struct. Biol., Vol. 155, pp. 97–102, 1999.4. N. Day, MPEG-7 Applications Document. ISO/IEC JTC1/SC29/WG11 N4676, 2002. Available at

http://mpeg.telecomitalialab.com/working documents.htm#MPEG-7.5. N. Day and J.M. Martınez, Introduction to MPEG-7 (v4.0), 2002 available at: http://mpeg.

telecomitalialab.com/working documents.htm#MPEG-7.6. A. Del Bimbo, V. Castelli, S.-F. Chang, and C.-S. Li (Eds.), Content Based Access of Images and Video

Libraries, Special Issue of Computer Vision and Image Understanding, Vol. 75, pp. 1–213, 1999 (doi:10.1006/cviu.1999.0775).

7. GlaxoWellcome Experimental Research Group, The Global Image Database, formerly available at:http://www.gwer.ch/qv/gid/gid.htm, but discontinued following merger with SmithKleinBeecham.

8. N. Guil, J.M. Gonzalez, and E.L. Zapata, “Bidimensional shape detection using an invariant approach,” J.Pattern Recognition, Vol. 32, pp. 1025–1038, 1999.

9. A. Hampapur, A. Gupta, B. Horowitz, S. Chiao Fe, C. Fuller, J. Bach, M. Gorkani, and R. Jain, “The Viragevideo engine,” in Proceedings of the SPIE, the International Society for Optical Engineering, Vol. 3022, 1997,pp. 188–198.

10. Virage Inc. http://www.virage.com11. G. Hobson, Hobson Tracker User Manual, Hobson Tracking Systems Ltd., 697 Abbey Lane, Sheffield S11

9ND, UK, 1996.



https://www.researchgate.net/publication/246528766_Introduction_to_MPEG7?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/246528766_Introduction_to_MPEG7?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==








56 RODRIGUEZ ET AL.

12. J.W. Lewis, “The effects of colchicine and brefeldin A on the rate of experimental in vitro epithelial woundhealing,” Undergraduate research project dissertation, University of Oxford, 1994.

13. J.W. Lewis and D.M. Shotton, “Effects of colchicine and brefeldin A on the rate of experimental in vitroepithelial wound healing,” in Proc. 4th Annual Meeting European Tissue Repair Soc., Oxford., 1994.p. 230.

14. J.W. Lewis and D.M. Shotton, “Time-lapse video microscopy of wound healing in epithelial cell monolayers:Effects of drugs that induce microtubule depolymerization and Golgi disruption,” Proc. Roy. Microscop. Soc.,Vol. 30, pp. 134–135, 1995.

15. J. Machtynger and D.M. Shotton, “VANQUIS, a system for the interactive semantic content analysis andspatio-temporal query by content of videos,” J. Microscopy, Vol. 205, pp. 43–52, 2002.

16. M. Manson, J. Armitage, J. Hoch, and R. Macnab, “Bacterial locomotion and signal transduction,” J. Bacte-riology, Vol. 180, pp. 1009–1022, 1998.

17. J. M. Martinez (Ed.), Overview of the MPEG-7 Standard. ISO/IEC JTC 1/SC29/WG11/ N5525, 2003 availableat: http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm

18. J.M. Martinez, R. Koenen, and F. Pereira, MPEG-7: The Generic Multimedia Content Descriptionstandard, 2002, available at: http://www.chiariglione.org/mpeg/events&tutorials/IEEEMM mp7overviewwithcopyright.pdf

19. F. Nack and A.T. Lindsay, “Everything you wanted to know about MPEG-7: Part 1,” IEEE Multimedia, Vol.6, pp. 65–77, 1999.

20. National Centre for Geographic Information, Ministerio de Fomento, Spain, Cartographic Publishing andService Catalogue, 2000, available at: http://www.cnig.ign.es.

21. S. Panchanathan and B. Liu (Eds.), Indexing, Storage, Browsing, and Retrieval of Images and Video.Special Issue of Journal of Visual Communication and Image Representation, Vol. 7, pp. 305–423, 1996(doi:10.1006/jvci.1996.0026).

22. N. Paskin, “Towards unique identifiers,” Proceedings IEEE, Vol. 87, pp. 1208–1227, 1999.23. N. Paskin, The DOI Handbook, 2000. Available at: http://www.doi.org/handbook 2000/index.html.24. A. Rodriguez, D.M. Shotton, N. Guil, and O. Trelles, “Automatic tracking of moving bacterial cells in scientific

videos,” in Proc. RIAO 2000, 6th Conference on Content-Based Multimedia Information Access, College deFrance, Paris, April 2000, available at: http://www.ac.uma.es/inv-des/inves/reports/2000.html

25. A. Rodriguez, D.M. Shotton, O. Trelles, and N. Guil, “Automatic feature extraction in wound healing videos,”in Proc. RIAO 2000, 6th Conference on Content-Based Multimedia Information Access, College de France,Paris, April 2000, available at: http://www.ac.uma.es/inv-des/inves/reports/2000.html

26. G. Rust, Metadata: The right approach. D-Lib Magazine, July 1998, available at:http://www.dlib.org/dlib/july98/rust/07rust.html.

27. G. Rust and M. Bide, “The INDECS metadata framework: Principles, model and data dictionary,” Wp1a-006-2.0, June 2000.

28. P. Salembier, “Audiovisual content description and retrieval: Tools and MPEG-7 standardization activities,”Tutorial: IEEE Int. Conf. on Image Processing, ICIP-2000, Vancouver, Canada, 10 Sept. 2000.

29. P. Salembier, J. Llach, and L. Garrido, “Visual segment tree creation for MPEG-7 Description Schemes,”Pattern Recognition, Vol. 35, pp. 563–579, 2002.

30. A. Sanfeliu, J.J. Villanueva, M. Vanrell, R. Alquezar, J.-O. Eklundh, Y. Aloimonos, A.K. Jain, J. Kittler, T.Huang, J. Serra, J. Crowley, and Y. Shirai (Eds.), in Proc. ICPR2000, 15th International Conference on PatternRecognition, Barcelona (Sept. 4–7, 2000), Vol. 1, pp. 1–1134: Computer vision and image analysis; Vol. 2, pp.1–1076: Pattern recognition and neural networks; Vol. 3, pp. 1–1165: Image, speech and signal processing;and Vol. 4, pp. 1–888: Applications, Robotic Systems and Architectures.

31. D.M. Shotton and A. Attaran, “Variant antigenic peptide promotes cytotoxic T lymphocyte adhesion to targetcells without cytotoxicity,” Proc. Natl. Acad. Sci. USA, Vol. 95, pp. 15571–15576, 1998.

32. D.M. Shotton, A. Rodriguez, N. Guil, and O. Trelles, “Analysis and content-based querying of biologicalmicroscopy videos,” in Proc. ICPR2000, 15th International Conference on Pattern Recognition, Barcelona,September 4–7 2000, available at: http://www.ac.uma.es/inv-des/inves/reports/2000.html.

33. D. M. Shotton, A. Rodriguez, N. Guil, and O. Trelles, “A metadata classification schema for semantic contentanalysis of videos,” J. Microscopy, Vol. 205, pp. 33–42, 2002.

https://www.researchgate.net/publication/3864144_Visual_segment_tree_creation_for_MPEG-7_description_schemes?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/3864144_Visual_segment_tree_creation_for_MPEG-7_description_schemes?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==







https://www.researchgate.net/publication/3338540_Everything_you_wanted_to_know_about_MPEG-7_1?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/3338540_Everything_you_wanted_to_know_about_MPEG-7_1?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==



https://www.researchgate.net/publication/2985560_Toward_unique_identifiers?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==




https://www.researchgate.net/publication/240316318_The_metadata_framework_Principles_model_and_data_dictionary?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/240316318_The_metadata_framework_Principles_model_and_data_dictionary?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==













https://www.researchgate.net/publication/285270216_Metadata_The_Right_Approach?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==

https://www.researchgate.net/publication/285270216_Metadata_The_Right_Approach?el=1_x_8&enrichId=rgreq-4e9f5b5936ab1a8edcef3563bc85a8d5-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY2NDM2MTtBUzoxMDM4NDQ1MTYyMDQ1NTZAMTQwMTc2OTg3MDU5Nw==


34. J.L. Sussman, D. Lin, J. Jiang, N.O. Manning, J. Prilusky, O. Ritter, and E.E. Abola, “Protein Data Bank (PDB):Database of three-dimensional structural information of biological macromolecules,” Acta Crystallogr. D.,Vol. 54, pp. 1078–1084, 1998.

Andres Rodrıguez received M.Sc. and Ph.D. degrees in computer science from the University of Malaga, Spain, in1995 and 2000 respectively. He joined the Computer Science Faculty of the University of Malaga in 1996, wherehe is now associate professor within the Department of Computer Architecture. His research interests includevideo processing and video content description, data mining and query by content in multimedia data bases.

Nicolas Guil received the B.S. in physics from the University of Sevilla, Spain, in 1986 and the Ph.D. in computerscience from the University of Malaga in 1995. During 1990–1997 he was assistant professor at the University ofMalaga. Currently, he is associate professor with the Department of Computer Architecture in the University ofMalaga. His research interests are in the area of video and image processing: tools for video indexing, automaticand semi-automatic video segmentation, image retrieval and active contours.

David M. Shotton received the B.Sc., M.Sc., and Ph.D. degrees from the University of Cambridge during the late1960s. He subsequently worked at Bristol University, the University of California at Berkeley, Harvard University,the Weizmann Institute of Science, and Imperial College London, before joining the Biological Sciences Facultyof the University of Oxford in 1981, where he is now Reader in Image Bioinformatics within the Department ofZoology.

58 RODRIGUEZ ET AL.

His cell biological research primarily involves the use of time-lapse video microscopy and freeze-fracture elec-tron microscopy to investigate cellular function. He has published extensively on light and electron microscopytechniques, and has taught frequently on international microscopy courses. He is a founding partner of the originalBioImage Database consortium, and recently established the Image Bioinformatics Laboratory within the Depart-ment of Zoology to handle and analyze digital images and videos of biological specimens. He is presently directorof the BioImage Database development project (http://www.bioimage.org) within the EC 5th Framework ORIELProject (http://www.oriel.org), and is also director of VideoWorks for the Grid, a U.K. e-Science Testbed Project(http://www.videoworks.ac.uk) within which runs VIDOS (http://www.vidos.ac.uk), a Web-based video editingand customization service.

His current research interests include ontology development for image databases and for animal behaviour, andsemantic content description and query-by-content of scientific videos.

Oswaldo Trelles received the B.S. degree in industrial engineering from the Universidad Catolica of Peru, the M.S.degree in Computer Sciences from the Polytechnical University of Madrid, Spain and the Ph.D. degree workingon Parallelization of Biological Sequences Analysis Algorithms on Supercomputers.

Currently he is an Associated Professor at the Department of Computer Architecture in the University of Malaga,Spain, where he is teaching Fundamental and Design of Operating Systems in the Computer Science faculty. Hismain interest research areas include Parallel Paradigms for distributed and shared memory parallel computers andgraphical support for exploratory analysis. Data mining and automatic knowledge discovering in biological dataare also of great interest in his research.

Analysis and Description of the Semantic Content of Cell Biological Videos

Documents

Transcript of Analysis and Description of the Semantic Content of Cell Biological Videos