From Multimedia to the Semantic Web using MPEG7 and Computational Intelligence

9
From Multimedia to the Semantic Web using MPEG-7 and Computational Intelligence G. Tummarello, C. Morbidoni, P. Puliti, A. F. Dragoni, F. Piazza Dipartimento di Elettronica Intelligenza Artificiale e Telecomunicazioni Universita’ Politecnica delle Marche (Ancona, ITALY) [email protected] Abstract In this paper we present an architecture that provides Semantic Web annotations of sound clips described by MPEG-7 audio descriptions. The great flexibility of the MPEG-7 standard makes especially difficult to compare descriptions coming from heterogeneous sources. To cope with this, the architecture will first obtain “normalized” versions of the audio descriptions using different adaptation techniques. Once in a “normalized” format, descriptions can be then projected into uniform and semantically relevant vector spaces, ready to be processed by a variety of well known computational intelligence techniques. As higher semantic results are then available, these can be exported as interoperable (RDF) annotations about the resource that was originally fed into the system. As novel aspect, through the use and interchange of MPEG-7 descriptions, the framework allows building applications (e.g. classificators) which can provide annotations on distributed audio resource sets. 1. Introduction Even since the first version, published in 2001 [1], MPEG-7 featured a very large quantity of descriptors allowing annotation of audio video sequences with information ranging from low level statistical features to purely semantic concepts. Although the standard was developed along with an experimental codebase (the so called XM, experimental model [2]), there is currently a notable lack of actual applications using it. This sort of initial “implementation flop” can be explained by looking at the complexity of the standard under the interoperability aspect. In fact, while is easy to create MPEG-7 compliant descriptions, the freedom in terms of structures and parameters is such that generically understanding MPEG-7 produced by others is difficult at least. While computational intelligence techniques are directly mappable to the applications evised for the standard [11], the actual usage of these is not direct. As MPEG-7 descriptions of identical object could in fact be very different from each other when coming from different sources with the sole requirement of syntactical “MPEG-7 compliance”, special care is needed so that these can be projected in a uniform vector space. While it would always be possible to “filter out” sources which do not comply directly to a simplified, pre-specified fixed MPEG-7 subset, it is clear that having a database as large as possible leads to more interesting results. Recognizing the intrinsic difficulty of a full interoperability, works are currently undergoing [3] to standardize subsets of the base features as “profiles” for different specific applications, generically trading off generality and expressivity in favor of the ease and lightness of the implementation. Necessarily, this also means to give up on interesting scenarios especially as multimedia distribution and clients are more and more network applications (as opposed to standalone “cd player” like appliances), and expected to comply to a tolerant, decentralized (Semantic) Web philosophy and architecture [13]. 2. Related works While there are many works related to specific areas of audio classification and intelligent processing (among the most recent, [14][15]), just a few use exclusively MPEG-7 as base features and generically all address very specific use cases.

Transcript of From Multimedia to the Semantic Web using MPEG7 and Computational Intelligence

From Multimedia to the Semantic Web using MPEG-7 and ComputationalIntelligence

G. Tummarello, C. Morbidoni, P. Puliti, A. F. Dragoni, F. PiazzaDipartimento di Elettronica Intelligenza Artificiale e Telecomunicazioni

Universita’ Politecnica delle Marche (Ancona, ITALY)[email protected]

Abstract

In this paper we present an architecture thatprovides Semantic Web annotations of sound clipsdescribed by MPEG-7 audio descriptions. The greatflexibility of the MPEG-7 standard makes especiallydifficult to compare descriptions coming fromheterogeneous sources. To cope with this, thearchitecture will first obtain “normalized” versions ofthe audio descriptions using different adaptationtechniques. Once in a “normalized” format,descriptions can be then projected into uniform andsemantically relevant vector spaces, ready to beprocessed by a variety of well known computationalintelligence techniques. As higher semantic results arethen available, these can be exported as interoperable(RDF) annotations about the resource that wasoriginally fed into the system.

As novel aspect, through the use and interchange ofMPEG-7 descriptions, the framework allows buildingapplications (e.g. classificators) which can provideannotations on distributed audio resource sets.

1. Introduction

Even since the first version, published in 2001 [1],MPEG-7 featured a very large quantity of descriptorsallowing annotation of audio video sequences withinformation ranging from low level statistical featuresto purely semantic concepts. Although the standard wasdeveloped along with an experimental codebase (the socalled XM, experimental model [2]), there is currentlya notable lack of actual applications using it. This sortof initial “implementation flop” can be explained bylooking at the complexity of the standard under theinteroperability aspect. In fact, while is easy to create

MPEG-7 compliant descriptions, the freedom in termsof structures and parameters is such that genericallyunderstanding MPEG-7 produced by others is difficultat least.

While computational intelligence techniques aredirectly mappable to the applications evised for thestandard [11], the actual usage of these is not direct. AsMPEG-7 descriptions of identical object could in factbe very different from each other when coming fromdifferent sources with the sole requirement ofsyntactical “MPEG-7 compliance”, special care isneeded so that these can be projected in a uniformvector space. While it would always be possible to“filter out” sources which do not comply directly to asimplified, pre-specified fixed MPEG-7 subset, it isclear that having a database as large as possible leadsto more interesting results.

Recognizing the intrinsic difficulty of a fullinteroperability, works are currently undergoing [3] tostandardize subsets of the base features as “profiles”for different specific applications, generically tradingoff generality and expressivity in favor of the ease andlightness of the implementation. Necessarily, this alsomeans to give up on interesting scenarios especially asmultimedia distribution and clients are more and morenetwork applications (as opposed to standalone “cdplayer” like appliances), and expected to comply to atolerant, decentralized (Semantic) Web philosophy andarchitecture [13].

2. Related works

While there are many works related to specific areasof audio classification and intelligent processing(among the most recent, [14][15]), just a few useexclusively MPEG-7 as base features and genericallyall address very specific use cases.

In [16] semantic information from audio files isused to allow browsing of libraries. However thesystem appears to be fully centralized, requiring all theaudio files to be available to the server as aprerequisite, and only partially relying on standardMPEG-7 features.

Also in [12] as well as in [17], MPEG-7 low leveldescriptors (LLD) are used but often associated withothers that are not currently in the standard. The featureused as well as the database structure crafted to solvethe specific problem (identification of music instrumentby timbre and audio fingerprinting respectively) appearto perform well, but still requires the complete set ofaudio files to extract metadata uniformly. In [20], thespecific problem of efficient audio matching isaddressed by means of indexes created by reducing thedimensionality descriptors selected solely from theMPEG-7. While at least this latest work can beconsidered a pure MPEG-7 application (that is, capableto operate without the actual data), the streams used asinputs must strictly match in structure and parametersand all the structure is precisely aimed at a specific usecase.

More general tools are considered in [4] whereinteroperability and manageability of MPEG-7structures by means of a large number of existing XMLdatabases are evaluated. The results show that thehierarchical structures spawned by the standard have sospecific semantics and requirements that even the bestsystems appear of very limited use (if any, when LLDare involved).

Higher level interoperability is studied in [8] by themeans of a mapping ontology expecially between spacetime segments and OWL (the semantic web ontologylanguage) union/disjunction operator. While this in factsucceeds in making the higher level MPEG-7descriptors interoperate on the Semantic Web, nomention is done on how mapping features that aremachine extractable.

Instead of concentrate on specific details of MPEG-7 performance when applied to specific use cases (e.g.genre classifications, matching etc..) in this paper wefocus on MPEG-7 low level interoperability andmerging with the Semantic Web. Steps in this directioninclude:

• Usage of web standard Uniform ResourceIdentifiers (URIs) both to indentify and to annotateresources;

• Tolerant, decentralized, web philosophy with noassumption on the availability of the actual soundsource (direct extraction as a last resource);

• No assumption on the origin and format of anexisting MPEG-7 descriptions. The system willtransparently adapt from different hopsizes,missing LLDs (by cross prediction) and differentways of representing data (e.g. Spectrum Envelopevs Basis + Projection, [1]);

• Can transparently use different LLDs to improvethe quality of each requested LLD with regards tocommon distortions caused by popular netdistribution formats (e.g. MP3);

• Generates results which are properly interoperableon the semantic web by using both RDF and anappropriate OWL ontology.

The architecture presented here is therefore aframework which naturally allows to develop SemanticWeb enabled applications using existing MPEG-7algorithms. Section 7 shows an example of such anapplication.

2. Architectural overviewThe proposed architecture (as currently

implemented, the MPEG7DB project [7]) is depicted inFigure 1.

S em antic A ssertions(A M atc hs B , A is o f typ e C )

U R I F ilte r(Loca l to g lob a l UR Is)

M peg7 P ro jection(fig . 2 )

M peg7 A C TD B

(fig . 1 )

RD F A nno tation

Mpe

g7 A

CT

Ext

rac t

ion

C om puta tiona l in te lligencein ference

(C lus ter in g , M a tc hin g , C las s ific atio n ..)

R D F Annota tionW rite r

U R I

O nto log ies

Figure 1 The overall structure of the proposedarchitecture. URIs are used to indicate whichaudio files are going to be considered in thedatabase. Once MPEG-7 descriptions areprovided, their projections are fed tocomputational intelligence techniques.

URIs are both used as references to the audio filesand as subject of the annotations produced in standardRDF/OWL format.

The MPEG7 Audio Compact Type (ACT) DB willfetch MPEG-7 descriptions of the selected URIs asdescribed in section 3. The MPEG-7 ACT type usedinside the DB will both be memory andcomputationally efficient (as opposed to the originalXML structure) and abstract from the fine details of theLLDs (such as Hopsize and the many possiblealternative of the scalar and vector series MPEG-7descriptors [1]). The projection block, described in

section 4 will provide projections of the MPEG7ACTto metric spaces as needed by the specific applicationbuilt upon this framework. In doing so, it will abstractfrom the specific structure of the original MPEG-7description by using a recursive adaptation technique.

Once a set of uniform projections had been obtainedfor descriptions within the database, classicalcomputational intelligence methods can be applied tofulfill the wanted task. Although fully functional, thecomputational intelligence blocks included in theframework (classificators and approximators includingsome based on Multi Layer Perceptrons NeuralNetworks) and used in the example application(chapter 7), could be substituted with any vector basedtechnique as needed.

Finally, once higher level results have been inferred(e.g , piece with URI“file://c:/MyLegalMusic/foo.mp3” belongs to genre“punk ballade”) they can be saved into the providedsemantic containers which will, hiding all thecomplexity, provide RDF annotations using termsgiven in an appropriate ontology (in OWL notation).Before outputting the annotation stream, the systemwill make sure that local URIs (e.g. “file://foo.mp3” )are converted into globally meaningful formats likebinary hash based URIs (e.g. hash “urn:md5: “ ,“ed2k:// “ , etc.).

3. From URI to MPEG7-ACT

When the database component is given a URIindicating a new audio clip to index, it will firstly try tolocate an appropriate MPEG-7 resource describing it.At this point it is possible to insert several alternativemodel of metadata research among which, calls to WebServices, queries on distributed P2P systems (P2PExchange of metadata being a very promising field ofstudy [18]) or lookup in a local storage or cache.

If this preliminary search fails to locate the MPEG-7file, a similar mechanism will attempt to fetch theactual audio file (if the URI turns out to be a resolvableURL) and process it with the included, co-developedMPEG7ENC library[6].

Once retrieved, the schema valid MPEG-7 is parsedrecursively so that the basic “stripes” of data belongingto Low Level Descriptors are mapped into flat, nameindexed, array structures. These will not only serve as aconvenient and compact container, but also provideabstraction from some of the basic freedom ofdescription allowed by MPEG-7. Among these, theMPEG7 ACT type provides the basic timeinterpolation/integration capabilities to handle the casewhen LLDs could have many different samplingperiods (even inside the same description block! Seethe “scaling” in [1]) and different grouping operators

UR

I

MPEG7 ACTDB

MPEG7

MPEG7From URI

UR

I

MPEG7Extraction

AudioDataFrom URI

MPEG7Encoder

URIA

udio

Da

taA

udio

Data

AudioData

Web Services, Caches, etc...

DBP2P

Figure 2 The MPEG-7 Audio Compact Typedatabase contains compact MPEG-7representations associated with the URI of theoriginal audio data. New audio segments areinserted simply by providing a corresponding URI.The database will first try to locate an existingMPEG-7 (either locally or querying remotesources) but it will also be able to fetch the originalaudio and encode it if needed.

Figure 3 Cross estimation of MPEG-7 descriptors by means of Multi Layer Perceptron. For each featurethe three others here represented are given as source of estimation.

M P E G 7L o w L e ve l

A u d io D e s c rip tio n

(S e le c te d F e a tu re s , P ro je c tio n O p e ra to rs )

R N

xx

x

x

xx

x

x

x

A u d io S p e c tru mB a s is+ P ro je c tio n s

A P

A S EA PA S E

A S B + P A S B + P

....

E n h a n ce me n t

C ro s sP re d ic t io n

.. ..

1 .2 1 .1 1 .0 0 .8 0 .5 0 .3Q u a lity In d e x

xx

xx

x

x x

Au d io P o w e r (AP )x

xx

x

x

A u d io S p e c tru m E n ve lo p e(A S E )

A P

A S E

....

A S B + P

R e c u rs ive L o w -le ve l F e a tu re s A d a p ta tio n

A u d io P o w erA u d io S p e c tru m E n ve lo p eA u d io S p e c tru mA u d io F la tn e ss...... . ..

σ

∑ −kk xad td x

P ro je c t

Figure 4 A feature space is specified in terms of selected features (e.g. “Audio power”, “Spectrumspread”, etc..) and Projection Operators (e.g. “max”, “mean” , possibly after filtering) and MPEG7 actdata to be projected is processed through a recursive system both capable of cross predicting a missingfeature and providing improved version of an existing one (if the sources are encoded with lossyformats).

applied (e.g AudioWaveformType could be expressedas Raw values or as 2 grouped series of “Max” and“Min”). Currently, the MPEG7 ACTs are only indexedby the URI that they describe, but it is in this objectthat can fit specific indexes of interest to the finalapplication, such as for example in [20].

4. Projection into a metric space

To exploiting the benefits of computational intelligence(e.g. neural networks) and perform clustering,matching, comparisons and classifications each MPEG-7 resource will have to be projected to a single, fixeddimension vector in a consistent and mathematicallyjustified way. In the next sections we discuss how weperform this task by the use of projection operators andinput adaptation structures.

4.1 Recursive adaptation

The following step is better understood as driven bya “feature space request”. A “feature space” deemedappropriate for the desired computational intelligencetask will be composed of couples (one per dimension)of feature names and functions capable of projecting aseries of scalars (or vectors) into a single scalar value.Among these, the frameworks provides a full set ofclassical statistical operators (mean, variance, higherdata moments, median, percentiles etc.. ) that can becascaded with other “pre processing” such as a filter.Since MPEG-7 coming from different sources andprocesses, could have different available features andnot necessarily those that we have selected asapplication “feature space”, a recursive crossprediction of the missing ones takes place. This processis described in .

For each of the required features, the system has aseries of “registered providers”. Each one of this isbound to provide the specified feature given a list ofprerequisite features it will rely on. For each possiblefeature a “obvious feature provider” is present that willsimply deliver the original data from the MPEG7ACTif this is available. Other registered providers willgenerically contain cross prediction algorithms toestimate the requested feature from others which arefound to be somehow related. Some cross featureestimation methods based on transformations andsignal processing techniques currently implementedinclude:

• AudioTempoType DS from AudioPower DSand/or from AudioSpectrumCentroid

• AudioPower DS from theAudioSpectrumEnvelope DS (ASE)

• AudioSpectrumEnvelope DS from theAudioSpectrumBasis+ AudioSpectrumProjectionDS (as specified in the MPEG-7 document)

It is also interesting to notice that, when a directalgorithm is not available, cross prediction based onneural networks proves to be, for a selected number offeatures, a viable alternative.

We define:

S/N ratio=

−∑

∑∑

=

=−

=r

n

nn

r

nnN

x xx

x

N1

2

1

21

010

)(log101

(1)

where nx is the nth scalar component of the cross

predicted MPEG-7 LLD, and nx the same componentof the reference LLD, r is the dimension of thedescriptor and N is the number of frames (e.g. 1 frameper 10ms hopsize as MPEG-7 specifies as default).

In the following table we see the performance of aMulti Layer Perceptron (MLP) based on a singlehidden layer and trained with data from different audioclips of different genre.

Each of the registered “feature provider” is alsobound to provide an a priori estimation of extractionquality. By definition, the “obvious feature provider”will return 1 as quality while cross predictiontechniques will return less. Interestingly, at this point, itis also possible to specify providers that will be able toprovide a feature with a quality greater than 1. This isthe case when the MPEG-7 has been extracted fromlossy audio formats.

To measure the impact of lossy compression on theMPEG-7 LLD quality we define S/N ratio as shown in(1) but we consider nx as the MPEG-7 LLD extracted

from the lossy source and nx as the same LLD from theclean (PCM) source.

It has been shown in [19] that the noise sointroduced is not negligible with S/N ratio, in case of128kbps MP3 encoded music, in the area of 33dBs forAudioWaveForm and 39dBs for AudioPower.

Table 1 Signal to Noise ratio of cross-predictedMPEG-7 features. In this experiment each featurewas predicted from the remaining three.

MPEG7-Feature Crosspredicted value S/N (dB)Power 8,42Waveform 11,96Spectrum Spread 11,16Spectrum Centroid 9,17

To increase the S/N ratio we experimented with“feature providers” based on MLPs. A single hiddenlayer network was used, trained with four differentLLDs derived from audio belonging to differentcategories and encoded in MP3 format. As target value,the neural network was given the input and the “clean”(PCM derived) version of one of them as targets.Gradient descent momentum technique andregularization by testing on a separated population wasapplied in the learning phase. Some significant timeseries are shown in Figure 3 and overall results areshown in the following table:

Table 2 AudioSpectrumSpread (ASS) andAusioSpectrumCentroid (ASC) enhancement byARX NN model, values in dB.

64Kb/s MP3 128Kb/sMP3

160Kb/sMP3

Feature ASS ASC ASS ASC ASS ASCOriginal 24.31 27.38 29.21 32.33 28.98 32.41Enhanced 27.28 32.19 32.49 35.24 31.52 35.48Gain 3.80 4.81 3.28 2.91 2.54 2.56

Overall, the adaptation algorithm can operaterecursively. In fact, as longs as loops are prevented,there is no reason why a “feature provider” (other thanthe “obvious feature provider”) should access directlythe original MPEG7 act type while providing anestimate of a requested LLD.

The top part of illustrates the overall process. Letsconsider as an example the case when the featureAudioPower (AP) is requested but onlyAudioSpectrum Basis+Projections (ASB+ASP) areavailable in the original MPEG-7 stream. Since the“obvious feature provider” for AP will fail, thealgorithm will try to resort to the next available adapter(ASE to AP) which then in turn will invoke the“ASB+ASP to ASE” feature provider. It is normal thatthe adaptation chain will necessarily introduce some

form of uncertainty. To cope with this, each featureprovider is requested to provide an estimate of itsability to reconstruct the requested feature given itsown prerequisites. By recursive multiplication of the“qualities” provided by the plugins in the adaptationchain for each specific request we are able to determineto best trajectory into the adaptation cascade andultimately provide a overall quality measure of theobtained feature that can be used for filtering or laterweighting purposes.

In this context it is also possible to insert adaptersto features that are not available in the MPEG-7standard. By providing a “provider plugin” which willoperate on top of the recursive structure describedabove, specific algorithms to extract new metadata(from existing LLDs) can be plugged in and the overallstructure will still be fully MPEG-7 compatible. As anexample in this study we developed a “provider plugin”that can supply an advanced version of the audioBPMand the new feature can be projected to a metric spaceexactly as those belonging to the MPEG-7 standard (asin Figure 4).

5. Once in a vector space

Once obtained, the projection vectors representing theMPEG-7 files in the db can be processed using avariety of well known techniques. The MPEG7DBproject already provides tools such as neural networks,statistical classifiers and approximators, but vectorscould also be readily exported for use in any external

Figure 3 MPEG-7 feature enhancement. Featureextracted from lossy compressed audio areenhanced by an ARX neural model.

engine. In section 7 we will show a toy applicationwhere the projections are fed to a supervised classifierwhich will then applied to a test set. To be noticed that the fundamental issues for manyusage scenarios (like “query by example”) of havinglow dimensionality indexes for efficient retrieval canbe at this point solved by indexing the projectedMPEG-7 using well known techniques such as R-Treesor Quad-Trees.

6. Reaching the Semantic Web with modelsand results

Although the architecture here described can bereadily used to create standalone MPEG-7 applications,as previously highlighted (see Figure 1 The overallstructure of the proposed architecture. URIs are used toindicate which audio files are going to be considered inthe database the structure is specifically built to allowdelivery of annotations on the Semantic Web.

The URI based database and the overall distributedand tolerant handling of the standardized multimediametadata create the necessary condition for theinteroperable reuse of the results.

In fact, once higher level semantic information hasbeen extracted (such as in the application described insection 7), this is likely to be informative and terseenough to be tractable with the current semantic webtechnologies (RDF reasoners and alike), which areusually much more similar to graph matchers thansignal processing or computational intelligencetechniques. To allow this, the architecture providessemantic web enabled “result containers”. Theseobjects, once provided with an OWL ontologydescribing the valid objects and property in the domainof interest ( [21]), store high level results and canproduce RDF (Resource Description Framework, see[22]) streams as final output.

Once annotations of object identified by URI havebeen produced in RDF+OWL language, we consider tohave reached the “Semantic Web” where a variety of

external agents would be able to later provide furtherfunctionalities.

7. An experimental SW-MPEG7application

All the work here presented has been implementedin Java (see [5] on why this is also computationallyacceptable) and is available for review, suggestions andcollaborative enhancement in the free software/opensource model [7].

As an example application we used the describedframework to classify the quality of speech contentaudio files using a rating much similar to whatproposed in the MPEG-7 amendment specifications interms of “Transmission technology”, basically a qualityrating of the medium that is “transmitting” the sound(e.g. a vinyl media). We proceeded by creating severalhigh quality audio recording from different speakersand artificially degrading the quality into 8 categoriesroughly matching quality 2 (pre 1925 shellacrecordings) to 9 (cd quality) as described in [23].During the training phase, a part of the recordings waslabeled and, by the described mechanism, projectedinto a 6 dimension vector space composed by the meanand the average of the AudioPower,AudioSpectrumSpread and AudioSpectrumCentroidMPEG-7 LLDs. The vector representation so obtainedwas then used to train a single hidden layer MLPNeural Network classifier having a single outputneuron requested to estimate the quality as a scalarvalue 1 to 8. Once trained, according to the well knownseparated training/validation set criteria to achievegeneralizing learning, we proceeded testing theclassifier on new unlabelled audio samples obtainingthe results shown in table 3.

Once these the high level “semantic assertions”from the classifier are available (e.g. audio clip A hasquality “8”), the “RDF Semantic Container” can storethem and, if an appropriate OWL ontology is provided,express them stream it using RDF.

Table 3 Classifications results using 3 features (AudioPower, SpectrumSpread+Centroid) and 2 operators(mean, stnd dev). Correct category match: 81%; Correct or neighboring category match: 100%. Real category 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8Obtainedvalue

1,03 1,05 2,16 1,98 3,11 2,85 4,08 4,125,57 5,44 6,76 7,18 6,657 7,99 7,96Extiamtedcategory

1 1 2 2 3 3 4 4 6 5 7 7 7 7 8 8

While it would be immediate to use existingontologies, in this example application we defined ourown simple one using the OWL language matching tomatch with simplicity the key concepts involved. Keyconcepts used match the problem closely and thereforeallow to express a “classification model” composed ofa set of features and operators on them, a “group” as acontainer for clips having certain similaritycharacteristics and the “belonging” of a particular clipto a particular group spawned by a given “classificationmodel”.

8. Conclusions

Even in this early experimentation phase thearchitecture performs as expected and succeeds inbridging multimedia to the semantic web by “distilling”from distributed sources of verbose and heterogeneousaudio metadata representations concise andsemantically relevant information to be published alongwith the relative machine readable ontology.

As a further work, we believe it is to be exploredhow the current platform can grow including tools (e.g.indexing systems) to solve different scenarios and usecases focusing on the same philosophy of optimal useof the available metadata. Also, since the results thatare published on the semantic could bear preciseannotations about the techniques used to obtain them, itwould be interesting to consider if and how resultscoming from different sources could be merged orreused.

9. Acknowledgments

Special gratitude goes to Holger Crysandt, JohanJohansson and Giuliano Marozzi for the supportprovided.

10. References

[1] ISO/IEC JTC1/SC29/WG11 N4031. MPEG-7 (2001)[2] MPEG-7 XM http://www.lis.ei.tum.de/research/

bv/topics/mmdb/e_mpeg7.html[3] ISO/IEC JTC1/SC29/WG11 N5527, MPEG-7 Profiles

under Consideration, March 2003, Pattaya, Thailand.[4] Utz Westermann, Wolfgang Klas. “An analysis of XML

database Solutions for the management of MPEG-7media descriptions”ACM Computing Surveys (CSUR)Dec. 2003.

[5] Ronald F. Boisvert, Jose Moreira, Michael Philippsen,and Roldan Pozo. “Java and Numerical Computing”IEEE Computing in Science and EngineeringMarch/April 2001

[6] Holger Crysandt, Giovanni Tummarello,MPEG7AUDIOENC –http://sf.net/projects/mpeg7audioenc

[7] G.Tummarello, C.Morbidoni, F.Piazza –http://sf.net/projects/mpeg7audiodb

[8] Jane Hunter, “Enhancing the semantic interoperabilitythrough a core ontology”,. IEEE Transactions oncircuits and systems for video technologies, specialissue. Feb 2003.

[9] Ralf Klamma, Marc Spaniol, Matthias Jarke “DigitalMedia Knowledge Management with MPEG-7”.WWW2003, Budapest.

[10] http://semanticweb.deit.univpm.it/ontologies/CWMpeg7Inference.owl , Ontology for inferences on sets mainlyaimed at computational intelligence algorithm

[11] ISO/IEC JTC1/SC29/WG11N5525 MPEG-7 Overview,Revision 9, March 2003

[12] Alicja a. Wieczorkowska, Jakub Wr Oblewski, PiotrSynak “Application of Temporal Descriptors to MusicalInstrument Sound Recognition, Journal of IntelligentInformation Systems”, (1): 71-93, July 2003

[13] Tim Berener Lee et Al. “Web Architecture from 50,000feet”, revised 2002http://www.w3.org/DesignIssues/Architecture.html

[14] Juan José Burred and Alexander Lerch “A hierarchicalapproach to automatic musical genre classification.”,6th International Conference on Digital Audio Effects,London, september 2003

[15] Xi Shao, Changsheng Xu, Mohan S Kankanhalli“Applying Neural Network On Content Based AudioClassification”, IEEE Pacific-Rim Conference OnMultimedia (PCM03), Singapore, 2003

[16] JJ. Pachet, F. Laburthe, A. Aucouturier “The CuidadoMusic Browser”, , CBMI 03, Rennes (Fr).

[17] Allamanche, E. Herre, J . Helmuth, O. Frba, B. Kasten,T and Cremer, M. “Content based identification ofAudio Material using MPEG-7 low level description”Proceedings of the Int. Symp of music informationretrieval 2001.

[18] http://p2p.semanticweb.org related to p2p and theexchange of metadata

[19] J. Lukasiak, D. Stirling, M.A. Jackson, N. Harders. “AnExamination of practical information manipulationusing the MPEG-7 low level Audio Descriptors” 1stWorkshop on the Internet, Telecommunications andSignal Processing

[20] Holger Crysandt “Music identication with MPEG-7”AES 115th Convention, New York 2003

[21] OWL – http://www.w3.org/2001/sw/WebOnt/[22] RDF - http://www.w3.org/RDF/

[23] Classification Schemes used in ISO/IEC 15938-4:Audio, ISO/IEC JTC 1/SC 29/WG 11N5727,Trondheim, Norway/Jul 2003