Implementing situation-aware and user-adaptive music recommendation service in semantic web and...

24
Implementing situation-aware and user-adaptive music recommendation service in semantic web and real-time multimedia computing environment Seungmin Rho & Seheon Song & Yunyoung Nam & Eenjun Hwang & Minkoo Kim Published online: 11 May 2011 # Springer Science+Business Media, LLC 2011 Abstract With the advent of the ubiquitous era, many studies have been devoted to various situation-aware services in the semantic web environment. One of the most challenging studies involves implementing a situation-aware personalized music recommendation service which considers the user s situation and preferences. Situation-aware music recommendation requires multidisciplinary efforts including low-level feature extraction and analysis, music mood classification and human emotion prediction. In this paper, we propose a new scheme for a situation-aware/user-adaptive music recommendation service in the semantic web environment. To do this, we first discuss utilizing knowledge for analyzing and retrieving music contents semantically, and a user adaptive music recommendation scheme based on semantic web technologies that facilitates the development of domain knowledge and a rule set. Based on this discussion, we describe our Context-based Music Recommendation (COMUS) ontology for modeling the user s musical preferences and contexts, and supporting reasoning about the user s desired emotions and preferences. Basically, COMUS defines an upper music ontology that captures concepts on the general properties of music such as titles, artists and genres. In addition, it provides functionality for adding domain-specific ontologies, such as music features, moods and situations, in a hierarchical manner, for extensibility. Using this context ontology, we believe that logical reasoning rules can be inferred based on high-level (implicit) knowledge such as situations from low-level (explicit) knowledge. As an innovation, our ontology can express detailed and complicated relations among music clips, moods and situations, which enables users Multimed Tools Appl (2013) 65:259282 DOI 10.1007/s11042-011-0803-4 S. Rho : E. Hwang (*) School of Electrical Engineering, Korea University, Seoul, Korea e-mail: [email protected] S. Song Graduate School of Information and Communication, Ajou University, Suwon, Korea Y. Nam Center of excellence for Ubiquitous System, Ajou University, Suwon, South Korea M. Kim Division of Information and Computer Engineering, Ajou University, Suwon, Korea

Transcript of Implementing situation-aware and user-adaptive music recommendation service in semantic web and...

Implementing situation-aware and user-adaptive musicrecommendation service in semantic web and real-timemultimedia computing environment

Seungmin Rho & Seheon Song & Yunyoung Nam &

Eenjun Hwang & Minkoo Kim

Published online: 11 May 2011# Springer Science+Business Media, LLC 2011

Abstract With the advent of the ubiquitous era, many studies have been devoted tovarious situation-aware services in the semantic web environment. One of the mostchallenging studies involves implementing a situation-aware personalized musicrecommendation service which considers the user’s situation and preferences.Situation-aware music recommendation requires multidisciplinary efforts includinglow-level feature extraction and analysis, music mood classification and human emotionprediction. In this paper, we propose a new scheme for a situation-aware/user-adaptivemusic recommendation service in the semantic web environment. To do this, we firstdiscuss utilizing knowledge for analyzing and retrieving music contents semantically,and a user adaptive music recommendation scheme based on semantic web technologiesthat facilitates the development of domain knowledge and a rule set. Based on thisdiscussion, we describe our Context-based Music Recommendation (COMUS) ontologyfor modeling the user’s musical preferences and contexts, and supporting reasoningabout the user’s desired emotions and preferences. Basically, COMUS defines an uppermusic ontology that captures concepts on the general properties of music such as titles,artists and genres. In addition, it provides functionality for adding domain-specificontologies, such as music features, moods and situations, in a hierarchical manner, forextensibility. Using this context ontology, we believe that logical reasoning rules can beinferred based on high-level (implicit) knowledge such as situations from low-level(explicit) knowledge. As an innovation, our ontology can express detailed andcomplicated relations among music clips, moods and situations, which enables users

Multimed Tools Appl (2013) 65:259–282DOI 10.1007/s11042-011-0803-4

S. Rho : E. Hwang (*)School of Electrical Engineering, Korea University, Seoul, Koreae-mail: [email protected]

S. SongGraduate School of Information and Communication, Ajou University, Suwon, Korea

Y. NamCenter of excellence for Ubiquitous System, Ajou University, Suwon, South Korea

M. KimDivision of Information and Computer Engineering, Ajou University, Suwon, Korea

to find appropriate music. We present some of the experiments we performed as a case-study for music recommendation.

Keywords Customization . Ontology . Reasoning . Semantic web . User profiles

1 Introduction

With recent progress in the field of music information retrieval, we face a new era thatcomputers can analyze and understand music automatically to some semantic level.However, due to the diversity and richness of music contents, this goal requiresmultidisciplinary efforts, ranging from computer science, digital signal processing,mathematics and statistics, to musicology. Most traditional content-based music retrieval(CBMR) techniques [2, 30] have focused on low-level features such as energy, zerocrossing rate, audio spectrum, and etc. However, these features failed to provide semanticinformation of music contents and this is a serious limitation in retrieving andrecommending appropriate music in various situations within the same time frame butdifferent locations. For example, consider a person who wants to listen to soft music whenthey wake up in the morning and they prefer fast beat music when they exercise at the gym.For this, more semantic information such as mood and emotion must be recognized fromlow-level features such as beat, pitch, rhythm and tempo. As another example, consider aperson who is very sad for some reason. Depending on their personality, they may want tolisten to some music that may cheer them up, or easy music that can make them calm down.

Due to the abovementioned limitations of low-level feature-based approaches, someresearchers have tried to bridge the semantic difference, which is also known as thesemantic gap, between the low-level features and high-level concepts [23,24]. Using onlylow-level feature analysis can result in many difficulties in identifying the semantics ofmusical contents. Similarly, it is difficult to correlate high-level features and musicsemantics. For instance, a user’s profile, which includes educational background, age,gender and musical taste, is one possible high-level feature. Semantic web technology hasbeen considered as a promising method to bridge this semantic gap. Recently, with thedevelopment of the semantic web, ontology has been widely used to assist knowledgesharing and reuse. Also, work on the emotional effects of music [12] suggested that anemotion experienced by a person listening to some music is determined by a multiplicativefunction consisting of several factors such as structural, performance, listener andcontextual features.

In this paper, we try to tackle the abovementioned problem in the domain of musicrecommendation by combining content-based music retrieval, music ontology and domainspecific ontologies such as Mood and Situation. More specifically, based on the basicconcepts from upper ontologies, which can be found in previous ontology-related projectssuch as Music Ontology [39, 40], Kanzaki taxonomy [11] and MusicBrainz [20], we candefine more specific domain-oriented ontologies. Music Ontology is an effort led by ZitGistLLC and the Centre for Digital Music that aims to express music-related information on thesemantic web. It attempts to express all the relations between musical information, enablingusers to find any desired information about music and musicians.

In our scenario for music recommendation, we considered musical terms as concepts.The relations consist of several types, including formal ontological relations such as ‘is-a’and ‘has-a.’ We deal with these two formal relations to indicate the specialization ofconcepts and required parts. For example, mood, genre and music features are part-of music

260 Multimed Tools Appl (2013) 65:259–282

and MFCC, tempo, onset, loudness and chroma is-a music feature. The other importantrelation is ‘like-song/singer/genre/mood’. This relation is used to describe the user’smusical and emotional preferences.

During the last decade, many researchers have investigated the influence of musicfactors such as loudness and tonality on the perceived emotional expression. They analyzedthose data using diverse techniques, some of which are involved in measuringpsychological and physiological correlations between the state of a particular musicalfactor and emotion evocation. In [10, 17], they investigated the utilization of acoustic cuesin the communication of music emotions by performers and listeners and measured thecorrelations between emotional expressions (such as anger, sadness and happiness) andacoustic cues (such as tempo, spectrum and articulation). In this paper, we will also use therules for acoustic and emotional expressive cues outlined in [10].

A mood/emotion descriptor has been useful and effective in describing musictaxonomy. An assumption for emotion representation is that emotion can be consideredas a set of continuous quantities and mapped to a set of real numbers. As a pioneeringeffort to describe human emotions, Russell [33] proposed a circumflex model whereeach affect is displayed over two bipolar dimensions. Those two dimensions are pleasant-unpleasant and arousal-sleep. Thus, each affect word can be defined as some combinationof pleasure and arousal components. Later, Thayer [35] adapted Russell’s model to music.Thayer’s model has “arousal” and “valence” as its two main dimensions. In this model,emotion terms were described as silent to energetic along the arousal dimension, andnegative to positive along the valence dimension. As shown in Fig. 1, the two-dimensional emotion plane can be divided into four quadrants with 11 emotion adjectivesplaced over them.

In many cases, the terms emotion and mood have been used interchangeably. It is notedthat in most psychology related books and papers [26], “emotion” usually has a shortduration (seconds to minutes) while “mood” has a longer duration (hours or days).Therefore, we use the word “emotion” in this paper to express the idea that relative to thetime frame of our fuzzy system, the semantic meaning attached to the word “emotion” isinherently associated with an ephemeral connotation, whereas the word “mood” isassociated with an extended time frame that begins with listening and ends at someindiscriminate point after listening is complete.

Fig. 1 Modified Version ofThayer’s 2 dimensionalemotion model

Multimed Tools Appl (2013) 65:259–282 261

In this paper, we develop extended music ontologies [34] to enable mood and situationreasoning in a music recommendation system. Those ontologies are described in the WebOntology Language (OWL) language using the Protégé editor. OWL provides threeincreasingly expressive sublanguages: OWL Lite, OWL Description Logic (DL) and OWLFull. OWL DL provides maximum expressiveness while retaining computationalcompleteness and decidability and thus is the implementation of choice when efficientreasoning support is desired. There are some reasons why a DL language is suitable for ourontology-based approach that motivated our choice. A DL language has a uniform syntax,unambiguous semantics and clear separation between concepts (classes) and instances(individuals). Therefore, a DL-based reasoning engine can answer semantic queries withregard to various types of mood and situation. For recommendation, we use the SPARQLquery language in conjunction with our music extended OWL ontology to find appropriatemusic. For example, when a user creates a profile containing their musical preferences, it isnot reasonable to expect them to specify all the required details in the profile. In that case,missing valuable information could be inferred from the partial information in the profile.

In the following section, we present a brief overview on the current state-of-the-artmusic recommendation systems, music-related ontologies and emotion models. In Section 3,we illustrate the musical feature extraction scheme and provide an overview of ourrecommendation system. Section 4 describes the extended music ontologies and theirknowledge representation schemes and demonstrates their usage in typical scenarios.Experimental results are provided in Section 5. In the last section, we conclude the paperwith some observations and directions for future work.

2 Related work

Many studies have been done in the area of music recommendation in the past few years. Inthis section, we first present a brief overview on the popular emotion models, and thenreview some of the recently developed music recommendation systems and music-relatedontologies.

2.1 Emotional model

Traditional mood and emotion research has focused on finding solutions of emotionrecognition and classification from psychological and physiological signals. During the1980s, several emotion models were proposed, which were largely based on dimensionalapproaches for emotion rating. Ortony, et al. [22] developed their theoretical approachunder the assumption that emotions develop as a consequence of certain cognitions andinterpretations. Therefore, their theory exclusively concentrates on the cognitive elicitors ofemotions. The authors claimed that these cognitions are determined by three aspects:events, agents and objects. These events, agents or objects are appraised according to anindividual’s goals, standards and attitudes. Emotions represent valenced reactions to theseperceptions of the world. Someone can be pleased about the consequences of an event ornot (pleased/displeased); they can endorse or reject the actions of an agent (approve/disapprove) and they can like or not like aspects of an object (like/dislike) (Ruebenstrunk).On the other hand, the dimensional approach focused on identifying emotions based ontheir location on a small number of dimensions such as valence and activity. Russell’s [33]circumflex model has had a significant effect on emotion research. This model defines atwo-dimensional, circular structure involving the dimensions of activation and valence.

262 Multimed Tools Appl (2013) 65:259–282

Within this structure, emotions on opposite sides of the circle, such as sadness andhappiness, correlate inversely. Moreover, the same circular structure has been found in alarge number of different domains including music. Thayer [35] suggested a two-dimensional emotion model that is simple but powerful in organizing different emotionresponses: stress and energy. The dimension of stress is called valence while the dimensionof energy is called arousal.

2.2 Music recommendation system

Most of the existing music recommendation systems are based on the user’s musicalpreferences from their listening history. Modern users have been enjoying popular musicrecommendation sites such as Last.fm [15], GarageBand [6], and MyStrands [21] whichextrapolate correlations between like-minded listeners to create recommendations.However, some users want to listen to music suitable to their specific emotions andsituations. We believe that it is becoming more important for people to express individualemotion states and situations and the relationship between listeners and tunes. To the best ofour knowledge, no effort has been made to measure and utilize the effect of musicconsidering emotions and situations.

In general, there are two major approaches for music recommendation: content-basedand collaborative filtering-based. The former analyzes the content of music that the user haspreferred in the past and recommends similar music. The latter recommends music that ispreferred by a user group with similar preferences.

Cano et al. [3] presented the MusicSurfer, a meta-data free system, to provide content-based music recommendation. MusicSurfer automatically extracts descriptions related toinstrumentation, rhythm and harmony from music audio signals. They demonstrated amusic browsing and recommendation system based on a high-level music similarity metriccomputed directly from audio data. Pauws et al. [27] developed Personalized AutomaticTrack Selection (PATS) using a collaborative filtering method. To create playlists, it uses adynamic clustering method in which songs are grouped based on their attribute similarity.An inductive learning algorithm based on decision trees is then employed to reveal theattribute values that might explain the removal of songs.

2.3 Ontology

2.3.1 The music ontology

The Music Ontology [39, 40] is an effort led by ZitGist LLC and the Centre for DigitalMusic that aims to express music-related information on the Semantic Web. The MusicOntology is an attempt to link all the information about musical artists, albums and tracks.The goal is to express all the relations between musical information, enabling users to findany desired information about music and musicians. It is based around the use of machinereadable information provided by any web site or web service.

The Music Ontology is mainly influenced by several ontologies such as FRBR(Functional Requirements for Bibliographic Records), Event, Timeline, ABC and FOAF.

The Music Ontology can be divided into three levels of expressiveness – fromsimple to more complex. Level 1 aims at providing a vocabulary for simple editorialinformation (tracks/artists/releases, etc.). Level 2 aims at providing a vocabulary forexpressing the music creation workflow (composition, arrangement, performance,recording, etc.). Level 3 aims at providing a vocabulary for complex event

Multimed Tools Appl (2013) 65:259–282 263

decomposition, to express things such as the events during a particular performance, themelody line of a particular work, etc.

2.3.2 Kanzaki vocabulary

Kanzaki [11] is a music vocabulary which describes classical music and performances. Itdefines classes for musical works, events, instruments and performers, as well as relatedproperties. It distinguishes musical works from performance events. The current version ofthis vocabulary is tested using a model to describe a musical work and its representations(performances, scores, etc.) and a musical event to describe a representation such as aconcert. This ontology consists of 112 classes, 34 properties definitions and 30 individuals.

2.3.3 The friend of a friend (FOAF)

The FOAF project [23] aims at creating a machine-readable ontology describing people andtheir activities and relations to each other and to objects. Anyone can use FOAF to describethemselves. FOAF allows groups of people to describe social networks without the need fora centralized database. FOAF is an extension to the RDF (Resource DescriptionFramework) [36] and can be defined using OWL (Web Ontology Language). For example,computers may use these FOAF profiles to find everyone living in a certain area, or to listpersons of interest. This is accomplished by defining people’s relationships. Each profilehas a unique identifier such as a person’s e-mail address, Jabber ID, or URI of thehomepage or weblog of the person, and this defines these relationships.

2.3.4 Other multimedia ontologies

Recently, many researchers have been focused on using semantic web technologies andstandard multimedia metadata such as multimedia ontology [31] or MPEG-7 [19] to dealwith the multimedia metadata interoperability problems. Poppe et al. [28] proposedsemantic and layered metadata model and integrated them into a video surveillancesystem. They created a global OWL-based ontology called VSS ontology and linked it tothe XML-based Computer Vision Markup Language (CVML) [16] using rules andmappings.

Richard et al. [31] presented COMM (Core Ontology for Multimedia) ontology, aMPEG-7 based multimedia ontology, that is well-founded and composed of multimediapatterns. Its model offers even more possibilities for multimedia annotation than MPEG-7since it is interoperable with existing web ontologies. To support conceptual clarity andextensibility towards annotation requirements, they also used the core ontology calledDOLCE. Krzysztof et al. [14] proposed a technique that compares ontologies as a wholestructure to assess similarity between users’ profiles and multimedia objects in therecommender system. To verify the usefulness of the proposed idea, they developed themultimedia sharing system for the use in the well-known system Flickr.

3 Music recommendation system

In this section, we describe the overall architecture of our situation-based, user-adaptivemusic recommendation system, details of musical feature extraction and some of theimplementation details.

264 Multimed Tools Appl (2013) 65:259–282

3.1 System overview

In this paper, we have implemented a prototype music recommendation system to bring musicto the semantic web as well as the extended music ontology to enable mood and situationreasoning in a music recommendation domain. As shown in Fig. 2, our system consists of threemain components: COMUS (COntext-based MUsic Recommendation System) along with theOntology, Score-based recommendation module and Mood Recognizer. The COMUS systemalso provides various types of query interfaces to the users. The user can formulate queriesusing one of three different query interfaces: query by situation (QBS), query by detailedsituation (QBDS), and query by mood (QBM). Our COMUS ontology is described in theOWL language using the Protégé editor. During the construction of the COMUS ontology,we use RacerPro for checking consistency, subsumption reasoning, and implicit knowledgeinference. More details about the ontology are described in Section 4. For retrieval andrecommendation, the Jena SPARQL engine is used to express and process the necessaryqueries for our ontology. Screenshots of the query and result interfaces are shown in Fig. 3.

In the score-based recommendation module, items (music) are clustered into smallgroups using the k-means clustering algorithm. And then item-to-item similarity scores arecalculated using Pearson’s correlation coefficient.

In the COMUS system, the context knowledge base stores the tree-like knowledge usedfor reasoning about the user’s mood and situation. Firstly, the music ontology as upper-levelontology stores ABox knowledge as OWL instances, which are derived from the MusicOntology project [39, 40]. Our music recommendation domain ontologies (User Preference,Mood and Situation) use TBox knowledge as concepts and relationships rather thaninstances. Moreover, we defined various logic rules to associate our domain ontologies withthe Music Ontology. These rules act as semantic mappers between these ontologies,enabling us to semantically correlate them for the purpose of sharing and reuse. Using thesemappings, we can retrieve some useful instance data from other domain ontologies insteadof using only the Music Ontology.

COMUS SystemMusic

recommendationListen

Update

Music FeaturesExtractor

PitchTempoLoudnessTonalityKeyRhythmHarmony

MoodComputation

MoodLabeling

Mood Map

Mood Mapper

User

Music

MoodRecognizer

Angry Excited

Calm

Sad Sleepy Peaceful

PleasedNervous

RelaxedBored

Happy

Reasoner

Rule-basedKB

Ontology-basedKB

ContextKnowledge Base

Ontology-basedRecommendation

COMUS Ontology

UserProference

Music

Situation Mood

Score-basedRecommendation

Musicrecommendation

Item-based Filtering

ClusteredItem

Vector

ListeningHistory

Fig. 2 System architecture

Multimed Tools Appl (2013) 65:259–282 265

Fig. 3 Screenshots of query and result interface

266 Multimed Tools Appl (2013) 65:259–282

The Mood Recognizer consists of three parts: (i) The Music Feature Extractor extractsseven distinct musical features and analyzes them (see Section 3.2); (ii) The MoodComputation module computes each song’s arousal/valence values with analyzed musicalfeatures using the acoustic and emotion rules defined by Juslin et al. [10]. Examplerelations are described in Table 1; (iii) The Mood Mapper maps the AV values to a mood. Inthe Mood Mapper, we used the labeling feedback provided by users in order to minimizethe mood difference. For accurate mood mapping, the Mapper is tuned by calculatingthe distance between the user labeled mood and the system calculated mood via thetraining step. A more detailed description for fuzzy-based mood mapping can be foundin [9].

3.2 Musical features

To classify music mood, we first extract musical features such as intensity, rhythm, scaleand harmony from acoustic music. These are fed into the fuzzy inference engine to obtaintheir arousal/valence values and then the AV values are mapped to their correspondingmood by Mood Mapper; Intensity is represented by loudness and its regularity, Rhythm bytempo and standard deviation of beat durations and Scale by key, mode and tonality.Finally, Harmony is represented by the harmonic distribution. From these musical features,our Mood Recognizer provides high-level semantic interpretation for the emotionalintentions of the listener and hence can classify them into one of the 11 emotions. Detailsof the feature extraction process are described in the following subsections.

3.2.1 Intensity

Since a human’s hearing organs are stimulated by the loudness of signals, the humanauditory system is greatly affected by the intensity of music. Intensity is usually the basis ofmusical beats, which can evoke a musical emotion. Especially, an intense beat makes thelistener feel powerful and tense. For example, general trance-style music has very strongand dynamic beats, thus evoking a tense atmosphere for listeners.

Table 1 Example relation: musical features correlated with discrete emotions in musical expression (Juslinand Sloboda, [10])

Emotion Musical features

Happy fast tempo, large pitch variation, few but structured harmonics, major tonality, middleground loudness level

Sad slow tempo, low pitch level, few and some dissonant harmonic, minor tonality, gentlerhythm, predominantly soft loudness level

Peaceful slow tempo, gradual rise & fall to the melody, gentile rhythm, major and minoralternation, soft loudness level

Angry quick tempo, high pitch level but small pitch variation, many harmonic, minortonality, sudden changes in rhythmic patterns, very loudness level

Nervous rapid tempo, high pitch level, random and tense dissonant harmonic, minor tonality,sudden bursts of loud playing building to climaxes

Excited fast tempo, high pitch level and large pitch variation, many harmonics, major tonality

Pleased fast tempo, low pitch level but large pitch variation, few harmonics, major tonality

Bored slow tempo, low pitch level and small pitch variation, few harmonics, minor tonality

Multimed Tools Appl (2013) 65:259–282 267

In order to represent the loudness of music, we calculate the average energy (AE) of themusic wave sequence. Also, we measure the regularity of loudness by using the standarddeviation (σ) of the average energy. Those are defined as:

AEðxÞ ¼ 1

N

XNt¼0

xðtÞ2 ð1Þ

s AEðxÞð Þ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

N

XNt¼0

AEðxÞ � xðtÞð Þ2vuut ð2Þ

, where x is an input discrete signal, t is the time in a sampling unit, and N is the length of x.Usually, AE reflects the overall volume of a song, which means that if the volume is high,AE also has a high value. In addition, Eq. 2 computes the regularity of loudness as thestandard deviation (σ) of AE. If loudness is very frequent, then σ is a high value.

3.2.2 Rhythm

Rhythm, which is composed of rhythmic features such as tempo and beat, is one of themost important elements in music. Beat is a fundamental rhythmic element of music. Itusually represents the periodic length of the 1/4 note. Tempo is usually defined as the beatsper minute (BPM), which is used to represent the global rhythmic features of music.Different tempos cause listeners to experience diverse emotions. In general, a fast musicbeat with high tempo usually makes listeners excited and strained. Meanwhile, a slowmusic beat with low tempo is boring, thus it gives low intension to listeners. On the otherhand, regularity of beats also evokes various emotions in listeners. Regular beats can makelisteners feel calm or even bored, but listeners can be made to feel uneasy or cranky byirregular beats.

Tempo and regularity of beats can be measured in various ways. For beat tracking andtempo analysis, we used the algorithm by Ellis et al. [5]. The features we use are the overalltempo (in beats per minute) and the standard deviation of beat intervals, which indicatestempo regularity.

3.2.3 Scale

In modern Western music, the most common scale is the 12-tone equal temperament, whichdivides the octave into 12 equal parts. A scale is an overall rule of tonic formation. In ourstudy, we defined a scale as the set of key, mode and tonality. A key is a set of musicalnotes. A key has its own tonic regularity for note representation and composition. Forexample, E major is composed of E, F#, G#, A, B, C#, D#, E. Also, the 8 notes appearfrequently in E major music. In detail, only the 3rd (G) and 6th I notes are changed in Eminor. This is called the musical mode property.

Typical music follows the musical mode property in music composition. Such musicmakes listeners feel peaceful or refreshed. However, there are also some exceptions inmusic composition. Also, atonal music has no regularities in key and mode. Sometimes,these exceptions can make listeners feel amused and excited, but can also make themfeel discomforted. Thus, there is a strong need to analyze the tonality of musicquantitatively.

268 Multimed Tools Appl (2013) 65:259–282

For accurate scale features, we first analyzed the chromagram for representing thefrequencies in musical scales. After that, we applied the key profile matrix by Krumhansl[13]. The following equations show the process of combining the chromagram and keycharacterization:

Tonality ¼ C �KeyProfileMatrix ð3Þ

Key ¼ maxKeyIndex

TonalityðIdxÞð Þ ð4Þ

,where vector C has 12 elements and represents the summed chromagram analyzed for eachacoustic frame. KeyProfileMatrix is a key profile matrix, which is composed of 12×24elements. KeyIndex indexes KeyProfileMatrix, where KeyIndex=1, 2,…, 24. After usingthe inner product of C and KeyProfileMatrix in Eq. 3, we obtain a tonality score for eachkey. Finally, we can obtain the most appropriate key by picking the one having themaximum tonality in Eq. 4.

3.2.4 Harmonicity

Harmonics can be observed in musical tones. In monophonic music, harmonics are easilyobserved in the spectrogram. However, it is hard to find harmonics in polyphony, becausemany instruments and voices are performing at once. To solve this problem, a method tocompute the harmonic distribution yields

Hðf Þ ¼maxNf¼1

PMk¼1

min ðjjX ðf Þjj; jjX ðkf ÞjjÞ� �

1N

PNf¼1

jjX ðf Þjjð5Þ

Here, M denotes the maximum number of harmonics considered, k is a constant, f is thefundamental frequency, and X is the short-time Fourier transform (STFT) of the sourcesignal. In our implementation, we measured the harmonicity by computing both an averageof each frequency and their standard deviation.

4 Ontological model

4.1 COMUS ontology

As we mentioned above, we need a set of common ontologies for knowledge sharing andreasoning in order to provide a music recommendation service intelligently. We have developedmusic and its related ontologies in the music recommendation domain. We use the W3Crecommendation ontology language OWL (Web Ontology Language) to represent the ontology.The OWL language is derived from the DAML + OIL language and both are layered on top ofthe standard RDF(S) triple data model (i.e., subject, predicate and object). We adopt the basicconcepts and relations from a previous work - the Music Ontology [39, 40] and expand it toinclude additional features such as musical features, genres, instrument taxonomies, moodsand situations. We serialize these ontologies by OWL so that we can retrieve informationusing the SPARQL query language. Also, we can use both rule-based and ontology reasoning.

Multimed Tools Appl (2013) 65:259–282 269

The COMUS ontology consists of about 826 classes and instances, and 61 propertiesdefinitions. Figures 4 and 5 show a graph representation and the case of instantiation forsome of the key COMUS ontology definitions. This ontology describes music relatedinformation about relationships and attributes that are associated with people, genres,moods (e.g., sad, happy, gloomy), locations (e.g., home, office, street), times (e.g., morning,spring), and situation events (e.g., waking-up, driving, working) in daily life.

The key top-level elements of the ontology consist of classes and properties that describePerson, Situation, Mood, Genre and Music classes.

& The Music class defines general properties of music such as titles, released year, artists,genres and musical features (e.g., MFCC, Tempo, Onset, Chroma, Segment, SpectralCentroid, Spectral Flux, Spectral Spread and Zero Crossing Rate).

& The Genre class defines the category of music. There have been many researches onmusic genre classification. There exist several popular online systems such as All MusicGuide, MusicBrainz and Moodlogic for annotating popular music genres and emotions.We create our own genre taxonomy based on All Music Guide along with a secondlevel of industry taxonomy.

& The Person class defines generic properties of a person such as name, age, gender,hobbies, socioeconomic background (e.g., job, final education) and music relatedproperties for music recommendation such as musical education and favorite music,genres and singers.

& The Mood class defines a person’s state of mind or emotions. Each mood has a set ofsimilar moods. For example, “aggressive” has similar moods such as “hostile, angry,energetic, fiery, rebellious, reckless, menacing, provocative, outrageous, and volatile.”

& The Situation class defines a person’s situation in terms of conditions and circum-stances, which are very important clues to effective music recommendation. The

Artist

Mood

Situation

Time

Location

Event

MusicalFeature

mo:musicGenre

PersonGoal

Fig. 4 Graph representation of the COMUS ontology

270 Multimed Tools Appl (2013) 65:259–282

situation is described by time, location, subject and goals. Hence, this class describesthe user’s situational contexts such as their whereabouts (Location), what happens tothem (Event), etc.

The following questions describe how this ontology might be used for reasoning aboutthe user’s situational contexts and recommending appropriate music.

& Q1: What kinds of music does he/she listen to when they feel gloomy lying in bed at night?& Q2: What kinds of music does he/she listen to when they are taking a walk and

enjoying a peaceful afternoon?& Q3: What kinds of music does he/she listen to when they wake up late in the morning?& Q4: What kinds of music does he/she listen to when they are driving into the office in

the morning?

Figure 6 shows part of the COMUS ontology in XML syntax. The key top-level elementsof the ontology consist of the classes and properties that describe Person, Situation, Mood,Genre and Music classes. In the following, we will briefly describe each of these and show afew SPARQL query examples for the scenario discussed in the next section.

4.2 Application scenario

OWL-DL is a syntactic variant of the Description Logic SHOIN [8], and an OWL-DLontology corresponds to a SHOIN knowledge base. SHOIN is one of the most expressive DLlanguages for which the decidability has been proved. In this subsection, we briefly presentthe syntax and semantics of the Description Logic SHOIN. We also introduce a typicalscenario that demonstrates how the COMUS ontology can be used to support ontology

SituationhasSituationName StringhasEvent UserEventInstance*hasTime TimeInstance*

hasUserMood MoodInstance*hasLocation LocationInstance*

hasEvent SituationInstance*

Time Location

UserEvent

MoodmName String

hasSimilarMood

MoodInstance*

A_S1_IntheMorningWakeupLatelyhasSituantionName= In the Morni

hasEvent= UserEventhasTime= Morning

hasUserMood=Plaintive

hasLocation=

BrightCheerful

BedRoom

WakingUp

Morning

Plaintive

Cheerful

hasSimilarMood= Bright

Bright

hasSimilarMood= Cheerful

Bedroom

hasSimiliarSituation*

hasTime*

hasLocation*

hasTime hasLocation

hasUserMood

hasEventhasUserMood

hasUserMood

hasUserMood*

hasEvent*

hasSimiliarMood*

io

io

io

io ioioio

hasSimiliarMood hasSimiliarMood

Fig. 5 An example of the COMUS ontology instantiation

Multimed Tools Appl (2013) 65:259–282 271

reasoning for recommending appropriate music to users. Below are examples of applicationscenarios for a situation-aware and user adaptive music recommendation system.

John is a senior financial accountant who is 43 years old. His favorite singer is “StevieWonder,” his favorite song is “Superstition,” and he likes Pop, Jazz and R&B stylemusic. His hobbies are playing tennis and squash. He is a very positive person and likesbright and sweet music. When he feels nervous, he usually listens to music that mighthelp him feel relaxed.

Scenario #1 (Consideration of musical tastes and listening habits): The date isWednesday, 25 March, 2009 and the time is 7 p.m. John usually listens to calm, softand sweet music when having his dinner at home. The music recommendation systemconsiders his listening history, his preferred artists and genres as well as other peoplewhose musical tastes are similar to his own on a weekday evening. Based on thesepreferences and history information, the system finally recommends sweet soft musicsuch as “I love you for sentimental reasons” by Laura Fygi, or “Will you still loveme tomorrow” by Inger Marie.

Scenario #2 (Consideration of situation and mood): The date is 13 April, 2009. Johnusually leaves home before 7 a.m. due to the rush hour traffic and morning meeting time.And he also usually listens to fast beat songs during his driving commute. However, Johnwoke up late on Monday morning and he is very tired because he was working late lastnight. But he must go to work early to prepare for a presentation at a Monday morningmeeting. The music recommendation system recommends some relaxing/peaceful songsinstead of recommending fast beat songs based on his listening history.

This scenario assumes that John has set his musical preferences (such as singers, genresand moods) to filter out the query result automatically. For example, the wake-up-late

Fig. 6 Part of the COMUS ontology represented in XML syntax

272 Multimed Tools Appl (2013) 65:259–282

situation is analyzed and sent to the recommendation system. Using this information, thesystem will reason about John’s situational context and his favorite mood from the userprofile information. From this information, the system recommends a set of music whichbest fits John’s interests and current situation.

A SHOIN knowledge base (KB) is based on sets NR (role names), C (atomic concepts) and I(individuals). In the following, this vocabulary is implicit and we assume that A, B are atomicconcepts, a, b, i are individuals, and R, S are roles. Those can be used to define conceptdescriptions employing the constructors from the upper part of Table 2. We use C, D to denoteconcept descriptions. Moreover, a SHOIN KB consists of two finite sets of axioms that arereferred to as TBox and ABox. The possible axiom types for each are displayed in the lowerpart of Table 2. We describe part of our ontology using the Description Logic syntax in Table 3.

A more flexible reasoning system can be deployed by specifying user-defined reasoningrules aimed towards defining high level conceptual contexts such as “What music does theuser want to listen to when he/she is stressed?” that can be deduced from relevant low-levelcontext. Table 4 describes the context reasoning rules to infer implicit knowledge in thesituation based on the current knowledge base in Table 3.

The user can ask questions about the situation via queries. The difference betweenquerying a database and querying an OWL knowledge base is that implicit facts are inferredas well as facts that have been explicitly asserted. When we have a situation goal, we canquery the music ontology to find a list of recommended music. The example of a SPARQLquery and its result are shown in Fig. 7.

5 Experiment

In this section, we describe the ontology development process and present the results of theexperiments we performed for context reasoning.

Table 2 SHOIN syntax and semantics

Name Syntax Semantics

inverse role R- {(x,y)|(y,x) A ∈ Rl}

top T ⊿bottom ⊥ Ø

nominal {i} {il}

negation ¬C ⊿\Cl

conjunction C ⊓ D Cl 7 Dl

disjunction C ⊔ D Cl? Dl

universal restriction ∀R.C {x|(x,y) ∈Rl implies y ∈Cl}

existential restriction ∃R.C {x| for some y ∈ ⊿,(x, y) ∈ Rl,y ∈ Cl}

(unqualified) number ≤ n R {x|# {y ∈⊿| (x, y) ∈ RT} ≤ n}

restriction ≥ n R {x|# {y ∈⊿| (x, y) ∈ RI} ≥ n}

role inclusion S ⊑ R Sl ⊆ Rl TBox

transitivity Trans(S) Sl is transitive TBox

general concept inclusion C ⊑ D Cl ⊆ Dl TBox

concept assertion C(a) al ∈ Cl ABox

role assertion R(a, b) (al, bl) ∈ Rl ABox

Multimed Tools Appl (2013) 65:259–282 273

5.1 Data set

The music dataset for our recommendation system is made up of 330 western pop songs.We collected the 30 songs in each of 11 categories of emotion (See Fig. 1) from the largemusic database, All Music Guide [1], which provides 180 emotional categories forclassifying entire songs. All the collected music files are stored as 41,000 Hz, 16-bit and 2channel stereo PCM.

5.2 Ontology development

According to [7], the development of ontology is motivated by scenarios that arise in theapplication. A motivating scenario provides a set of intuitively possible solutions to theproblems in the scenario. The COMUS ontology is a collection of terms and definitionsrelevant to the motivating scenarios of music recommendation that we described above.Thus, to build the ontology, it is important to start with describing the basic concepts andone or more scenarios in the specific domain of interest.

After building the ontology, the next step is to formulate competency questions. Theseare also based on the scenarios and can be considered as expressiveness requirements in theform of questions. The ontology must be able to represent these questions using its domain-related terminology, and characterize the answers using the axioms and definitions.Therefore, we asked participants to answer the competency questions via the onlinequestionnaire system.

In the experiment, we had about 30 participants. Some were musically trained and otherswere not. Participants were asked to fill out the questionnaire in order to collect suitable termsand definitions about situations and moods. They were also requested to describe their ownemotional state transitions such as current and desired emotions in the specific scenario. Thedescription was based on one or more emotional adjectives such as happy, sad, angry, nervousand excited, and those were collected from the All Music Guide taxonomy. Finally, the mostfrequently described adjectives were chosen to define the instances in the COMUS ontology.

Table 3 Example of TBox and ABox

TBox

Agent ⊑ Object Person ⊑ Agent

Man ⊑ Person Woman ⊑ Person

Person ⊓ ∀hasJob.Job ⊑ Worker performer ≡ performedBy–

Music ⊓ ∀similarMood ⊑ Person Time ⊓ Location ⊓ Object ⊓ Event ⊑ Situation

ABox

Person(John) Music(Superstition)

Person(StevieWonder) hasJob(StevieWonder, “Artist")

hasName(John “John”) performer(Superstition, “Stevie Wonder”)

hasName(StevieWonder, “Stevie Wonder”) hasHobby(John, "Baseball”)

hasJob(John, “financial accountant”) hasGenre(Superstition, “Pop”)

has(Time, “Morning”) likeMood(John, “bright”)

hasLocType(Location, “Street”) hasMood(Superstition, “bright”)

event(TrafficJam) similarMood(“bright”, “sweet”)

Situation(WakeUpLateTrafficJam) likeGenre(John, “Jazz”)

likeGenre(John, “Pop”) …

274 Multimed Tools Appl (2013) 65:259–282

After building the ontology, we performed an experiment to measure the level of usersatisfaction using either our proposed COMUS ontology or the AMG taxonomy in oursystem. The procedure for the experiment was as follows.

1) The experimenter explained the procedure and the purpose of the experiment anddemonstrated how to run our music recommendation system.

2) The participant had to describe his profile (e.g., musical preferences) using web forminterfaces such as buttons, textboxes, checkboxes and selection lists.

3) All the participants were told to describe the situation and their current and desiredemotions or to just select the predefined scenario using the query interfaces.

4) The system returned recommended songs based on ontology reasoning and the participant’sprofile. The participant judged which one was appropriate for their current emotion.Participants used 5-point rating scales (from 1 – strongly unsatisfied to 5 – strongly satisfied).

5) All the participants were asked to fill out a questionnaire.

As shown in Table 5, over 80% of the participants responded positively in terms ofoverall satisfaction with the system using the ontology instead of the AMG taxonomy. The

Table 4 Example of rule

Rule example

deftemplate person (slot name) (slot age (type INTEGER)) (slot job) (slot likeMusic) (slot likeMood))

(deftemplate music (slot title) (slot genre) (slot performer) (slot hasMood))

(deftemplate mood (slot name))

(deftemplate situation (slot situation-name) (slot subject) (slot goal) (slot event) (slot time))

(deftemplate event (slot event-name))

(deftemplate location (slot location-name))

(deftemplate time (slot time-name) (slot value)

(deftemplate artist (slot name) (slot perform))

(deftemplate similarMood (slot mood1) (slot mood2))

(defrule wake-up-late

(person (person-name ?name))

(location (location-name "bed"))

(time (time-name "Morning") (value 8))

(event (event-name "wake-up"))

=>

(assert (situation (situation-name "wake-up-late") (subject ?name)))

(defrule situation-traffic-jam

(person (person-name ?name))

(location (location-name "street"))

(event (event-name "traffic jam"))

=>

(assert (situation (situation-name “situation-traffic-jam”) (subject ?name)))

(defrule situation-traffic-jam-morning

(situation (situation-name “situation-traffic-jam”))

(time (time-name “morning”))

=>

(assert (situation (situation-name “situation-traffic-jam-morning”)))

Multimed Tools Appl (2013) 65:259–282 275

satisfaction ratings show that most of the users were satisfied with the query resultsrecommended by the system.

With regard to satisfaction with the participant’s preferred emotional adjectives depictedin Table 6, positive adjectives (such as happy and excited) were considered moresatisfactory by about 78% of the participants whereas ambiguous adjectives such as“nervous” by 43% of the participants (in the case of using the COMUS ontology).

5.3 Experimental results for effectiveness of our system

In order to measure the effectiveness of our recommendation system, we have adoptedmetrics commonly used in information retrieval systems. The effectiveness of our musicrecommendation is measured by Precision, Recall and F-measure, which are defined asfollows:

P ¼ NS

NT; R ¼ NS

NRð6Þ

, where NS is the number of songs selected as relevant by the user from therecommended list, NT is the total number of songs in the recommended list and NR isthe total number of relevant songs classified by All Music Guide (AMG) moods andsituations. The F-measure represents the harmonic average of the precision and recall.The F-measure was derived by van Rijsbergen [37]); Fβ “measures the effectiveness ofretrieval with respect to a user who attaches ß times as much importance to recall as

Fig. 7 Example of SPARQL query

Table 5 Participants’ opinions about the system using either the COMUS ontology or the AMG taxonomy

1 (unsatisfied) 2 3 (neutral) 4 5

AMG Taxonomy 1 3 19 5 2

COMUS Ontology 0 2 4 16 8

276 Multimed Tools Appl (2013) 65:259–282

precision.” It is based on van Rijsbergen’s effectiveness measure (E-measure) defined asfollows:

E ¼ 1� b2PRþ PR

b2P þ Rð7Þ

, where P is the precision, R is recall and b is the measure of relative importance of P orR. The F-measure can be defined as the following simple equation: F ¼ 1� E.Figure 8a~d shows the results of precision, recall, F-measure and 11-point precision-recall, respectively.

For the performance comparison between the content-based filtering (CBF) method(namely, score-based recommendation) and ontology-based recommendation (OBR), boththe precision and F-measure of the OBR method is better than that of the CBF method.Overall, we can see that our proposed ontology-based recommendation system outperformsconventional recommendation methods such as CBF.

5.4 Comparison of ontology-based reasoning and rule-based reasoning

In this experiment, we compare the performance of ontology-based reasoning and rule-basedreasoning under two different hardware platforms: one with 1 GB memory and P4/2.4 GHzCPU and the other with 512 KBmemory and P3/1.6 GHz CPU. The dataset size is measured interms of the number of RDF triples, which are represented in subject, verb and object (SVO)form. Our COMUS ontology contains 826 OWL classes and instances, which are parsed to3,645 RDF triples.

Figure 9 shows the results. As expected, the performance of ontology-based reasoningdepends on the following factors: the size of the ontology, the complexity of the reasoningrules and the CPU speed. The performance comparison conducted using different datasetsand machine hardware configurations indicate that context reasoning based on logic is acomputationally intensive task. That is, the time required for ontology-based reasoningincreases dramatically as the size of the ontology increases. From the result, we can see thatontology-based reasoning largely depends on the size of the ontology, while rule-basedreasoning depends on the complexity of the rule sets.

Table 6 Participants’ preferred emotional adjectives

AMG Taxonomy COMUS Ontology

1 2 3 4 5 1 2 3 4 5

Angry 4 5 9 8 4 1 6 11 9 3

Bored 9 12 6 2 1 1 3 6 11 9

Calm 2 6 9 9 4 2 5 12 7 4

Excited 3 4 6 8 9 0 3 6 8 13

Happy 4 8 10 6 2 0 2 2 9 17

Nervous 3 15 8 4 0 4 5 8 8 5

Peaceful 3 7 8 6 6 0 3 5 14 8

Pleased 6 7 13 3 1 0 1 5 9 15

Relaxed 6 7 12 3 2 1 4 12 6 7

Sad 4 8 16 2 0 0 1 14 11 4

Sleepy 0 9 11 6 4 2 4 6 12 6

Multimed Tools Appl (2013) 65:259–282 277

6 Conclusions and future work

In this paper, we presented ontology based context model that is feasible and necessary forsupporting context modeling and reasoning in music recommendation. We modeled musicaldomain and captured low-level musical features and several musical factors to describemusic moods and music-related situations composed of time, location and subject to buildontology. We have constructed musical ontology based on the current music ontology aspart of ongoing project on building intelligent music recommendation system. To show its

Fig. 9 Experimental results ofthe reasoning performance

Fig. 8 Effectiveness of our recommendation system

278 Multimed Tools Appl (2013) 65:259–282

feasibility, in addition, we set up a usage scenario and presented several queries forreasoning useful information from the ontology.

Currently, the COMUS ontology still has some limitations and issues to deal with thereasoning and recommendation. Therefore, we are extending our reasoning model andontology that support more precise music recommendation at this time. In future work, weconsider using other contextual information such as tags or texts from the social networkservices (e.g., Lastfm [15]). Also, the lyrics could help to get more mood/emotioninformation by the combination of extracting meaningful keywords and expanding thekeywords using WordNet [38].

Acknowledgment “This research was supported by the MKE(Ministry of Knowledge Economy), Korea,under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency)” (NIPA-2011-C1090-1101-0008)

References

1. All Music Guide, Available at: http://allmusic.com2. Birmingham W, Dannenberg R, Pardo B (2006) An Introduction to query by humming with the vocal

search system. Commun ACM 49(8):49–523. Cano P et al (2005) “Content-based music audio recommendation,” Proc of ACMMultimedia, pp. 211–2124. CYC upper ontology, Available at: http://www.cyc.com/cycdoc/vocab/vocab-toc.html5. Ellis DPW, Poliner GE (2007) Identifying ‘Cover Songs’ with chroma features and dynamic

programming beat tracking. IEEE Conf Acoustic Speech Signal Process ICASSP IV:1429–14326. GarageBand, Available at: http://www.garageband.com/7. Grüninger M, Fox MS (1994) “The role of Mariano Fernández López 4–12 competency questions in

enterprise engineering,” In: IFIP WG 5.7 Workshop on Benchmarking. Theory and Practice, Trondheim,Norway

8. Horrocks I, Sattler U (2005) “A tableaux decision procedure for SHOIQ,” In: Proc. of the 19th Int. JointConf. on Artificial Intelligence (IJCAI 2005), Morgan Kaufman

9. Jun S, Rho S, Han B, Hwang E (2008) “A Fuzzy Inference-based music emotion recognition system,”International Conference on Visual Information Engineering, pp. 673–677, July 29~Aug. 1

10. Juslin PN, Sloboda JA (2001) Music and emotion: theory and research. Oxford University Press, NewYork

11. Kanzaki Music Vocabulary, Available at: http://www.kanzaki.com/ns/music12. Klaus RS, Marcel RZ (2001) Emotional effects of music: production rules,” music and emotion: theory

and research. Oxford University Press, Oxford13. Krumhansl C (1990) “Cognitive foundations of musical pitch”, Oxford University Press14. Krzysztof J, Przemyslaw K, Katarzna M (2010) “Personalized ontology-based recommender systems for

multimedia objects”, Agent and multi-agent technology for internet and enterprise systems, Studies inComputational Intelligence, Vol.289, pp.275-292, 2010

15. Last.fm, Available at: http://www.last.fm16. List T, Fisher RB (2004) “CVML – An XML-based computer vision markup language,” Proceedings of

the 17th international conference on pattern recognition, pp. 789–79217. Lu L, Liu D, Zhang H-J (2006) Automatic mood detection and tracking of music audio signals. IEEE

Trans Audio Speech Lang Process 14(1):5–1818. Mood Logic, Available at: http://www.moodlogic.com/19. MPEG-7, Available at: http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-7.htm20. MusicBrainz, Available at: http://musicbrainz.org21. MyStrands, Available at: http://www.mystrands.com/22. Ortony A, Clore GL, Collins L (1998) The cognitive structure of emotions. Cambridge University Press,

Cambridge23. Oscar C (2008) Foafing the music: bridging the semantic gap in music recommendation. J Web Seman 6

(4):256–25624. Oscar C, Perfecto H, Xavier S (2006) “A multimodal approach to bridge the music semantic gap,”

Semantic and Digital Media Technologies (SAMT)

Multimed Tools Appl (2013) 65:259–282 279

25. OWL Web Ontology Language, Available at: http://www.w3.org/TR/owl-ref/26. Paulo N et al (2006) “Emotions on agent based simulators for group formation,” Proceedings of the

European Simulation and Modeling Conference, pp. 5–1827. Pauws S, Eggen B (2002) “PATS: realization and user evaluation of an automatic playlist generator,”

Proceedings of ISMIR, pp. 222–22728. Poppe C, Martens G, Potter PD, Walle RVD (2010) “Semantic web technologies for video surveillance

metadata,” Multimedia Tools and Applications, online published at29. Protégé Editor, Available at: http://protege.stanford.edu30. Rho S, Han B, Hwang E, Kim M (2008) MUSEMBLE: a novel music retrieval system with automatic

voice query transcription and reformulation. J Syst Softw Elsevier 81(7):1065–108031. Richard A, Raphaël T, Steffen S, Lynda H (2009) “COMM: a core ontology for multimedia annotation”,

Handbook on ontologies. International Handbooks on Information Systems, pp.403-42132. Ruebenstrunk G “Emotional Computers,” Available at: http://ruebenstrunk.de/emeocomp/content.HTM33. Russell JA (1980) “A circumplex model of affect,” J Pers Soc Psychol Vol. 3934. Song S, Rho S, Hwang E, Kim M (2009) “Music ontology for mood and situation reasoning to support

music retrieval and recommendation,” In: Proceedings of the International Conference on DigitalSociety, Cancun, Mexico, pp. 304–309, 2009

35. Thayer RE (1989) The biopsychology of mood and arousal. Oxford University Press, New York36. W3C. RDF Specification, Available at: http://www.w3c.org/RDF37. Wikipedia, Available at: http://en.wikipedia.org/wiki/Information_retrieval38. WordNet, Available at: http://wordnet.princeton.edu/39. Yves R, Frederick G, “Music ontology specification,” Available at: http://www.musicontology.com/40. Yves R, Samer A, Mark S, Frederick G (2007) The music ontology. Proc Int Conf Music Inf Retrieval

ISMIR 2007:417–422

Seungmin Rho received his MS and PhD Degrees in Computer Science from Ajou University, Korea, inComputer Science from Ajou University, Korea, in 2003 and 2008, respectively. In 2008-2009, he was aPostdoctoral Research Fellow at the ComputerMusic Lab of the School of Computer Science in Carnegie MellonUniversity. He is currently working as a Research Professor at School of Electrical Engineering in KoreaUniversity. His research interests include database, music retrieval, multimedia systems, machine learning,knowledge management and intelligent agent technologies. He has been a reviewer in Multimedia Tools andApplications (MTAP), Journal of Systems and Software, Information Science (Elsevier), and ProgramCommittee member in over 15 international conferences. He has published 17 papers in journals and bookchapters and 25 in international conferences and workshops. He is listed in Who’s Who in the World.

280 Multimed Tools Appl (2013) 65:259–282

Seheon Song received his B.S. and M.S. degrees in Computer Science from Ajou University, Korea, in 2001and 2003, respectively. Currently he is pursuing a Ph.D. degree in the Computer Science Department of AjouUniversity. He is currently working on intelligent agent modeling and system development. His researchinterests include context-aware, knowledge representation and reasoning, and intelligent agent technologies.Mr. Song is a student member of the IEEE.

Yunyoung Nam received the B.S, M.S. and Ph.D. degree in computer engineering from Ajou University,Korea in 2001, 2003, and 2007 respectively. He has been a research engineer in the Center of Excellence inUbiquitous System (CUS) since 2007. He has been a post-doctoral researcher at Stony Brook University,Stony Brook, NY, USA since 2009 to now. His research interests include multimedia database, ubiquitouscomputing, image processing, pattern recognition, context-awareness, conflict resolution, wearablecomputing, and intelligent video surveillance.

Multimed Tools Appl (2013) 65:259–282 281

Eenjun Hwang received his BS and MS Degree in Computer Engineering from Seoul National University,Seoul, Korea, in 1988 and 1990, respectively; and his PhD Degree in Computer Science from the Universityof Maryland, College Park, in 1998. From September 1999 to August 2004, he was with the Graduate Schoolof Information and Communication, Ajou University, Suwon, Korea. Currently he is a member of the facultyin the School of Electrical Engineering, Korea University, Seoul, Korea. His current research interestsinclude database, multimedia systems, audio/visual feature extraction and indexing, semantic multimedia,information retrieval and Web applications.

Minkoo Kim received his B.S. degree in Computer Engineering from Seoul National University, Seoul,Korea, in 1977; and M.S. degree in Computer Engineering from KAIST (Korea Advanced Institute ofScience and Technology), Daejeon, Korea, in 1979. He received his Ph.D. degree in Computer Science fromthe Pennsylvania State University, in 1989. From January 1999 to January 2000, he was with the Universityof Louisiana, CACS as a visiting researcher. Since 1981, he has been a member of the faculty in the Collegeof Information Technology, Ajou University, Suwon, Korea. His current research interests include multi-agent systems, information retrieval, ontology and its applications.

282 Multimed Tools Appl (2013) 65:259–282