Using sonification for mining time series data

24
1 1 Using Sonification Using Sonification for for Mining Time Series Data Mining Time Series Data Mark Last and Anna Gorelik Project Web Page: http://iui.bgu.ac.il/projects/sonification/en/experiment_examples.html 9th Intl. Workshop on Multimedia Data Mining (MDM/KDD 2008) Las Vegas, NV, USA, August 24, 2008

Transcript of Using sonification for mining time series data

11

Using SonificationUsing Sonificationfor for

Mining Time Series DataMining Time Series Data

Mark Last and Anna Gorelik

Project Web Page:

http://iui.bgu.ac.il/projects/sonification/en/experiment_examples.html

9th Intl. Workshop on Multimedia Data Mining (MDM/KDD 2008)Las Vegas, NV, USA, August 24, 2008

22

Outline

• Motivation and Background

• The Sonification Procedure

• Sample Results

• Empirical Evaluation

• Conclusions and Future Research

3

Last and Gorelik (BGU)

MotivationMotivation

There is a growing interest in mining There is a growing interest in mining time series time series databasesdatabases by both automated and interactive tools .by both automated and interactive tools .For long, For long, visualizationvisualization has proven to be an effective has proven to be an effective approach to interactive analysis of timeapproach to interactive analysis of time--series data. series data.

““One picture = 1,000 wordsOne picture = 1,000 words”” ((畫意能達萬畫意能達萬) )

SonificationSonification tries to use tries to use soundsound to represent and to represent and possibly analyze data in an interactive fashion possibly analyze data in an interactive fashion

Is Is ““sound sound ≥≥ picturepicture”” for various interactive time series for various interactive time series mining tasks?mining tasks?

4

Last and Gorelik (BGU)

MiningMining Time Series DatabasesTime Series DatabasesTime Series Database (TSDB)Time Series Database (TSDB)

Each records contains sequential Each records contains sequential information (usually represented by a information (usually represented by a timetime--stampstamp))

ExamplesExamplesStock prices, process control data, Stock prices, process control data, meteorological measurementsmeteorological measurements

Two basic categoriesTwo basic categoriesSingleSingle--variate (variate (univariateunivariate) time series) time seriesMultivariateMultivariate time seriestime series..

TSDB mining tasksTSDB mining tasksIndexing, segmentation, event Indexing, segmentation, event detection, clustering, classification, detection, clustering, classification, forecasting, etc.forecasting, etc.

Univariate Time Series

Multivariate Time Series

5

Last and Gorelik (BGU)

What is Sonification?What is Sonification?

SonificationSonificationThe use of nonThe use of non--speech audio to convey speech audio to convey informationinformation

Traditional sonification applicationsTraditional sonification applicationsSonar, Morse code, telephone ring, smoke Sonar, Morse code, telephone ring, smoke alarm, Geiger counter, metal detector, etc.alarm, Geiger counter, metal detector, etc.

DataSonification Algorithm

6

Last and Gorelik (BGU)

Why Sonification ?Why Sonification ?Main Advantages for Mining TSDBMain Advantages for Mining TSDB

A remarkable A remarkable bandwidthbandwidth of the of the human aural system human aural system Our right brain can absorb and integrate the horn sounds Our right brain can absorb and integrate the horn sounds made by other drivers as well as the music from the car radio made by other drivers as well as the music from the car radio

The human aural system has faster The human aural system has faster reaction to sudden reaction to sudden eventsevents than a visual system than a visual system The The sequential naturesequential nature of the soundof the sound

As opposed to visual images, which can be distorted quite As opposed to visual images, which can be distorted quite easily, a sound sequence has to be heard in a given order easily, a sound sequence has to be heard in a given order

Giving an advantage to Giving an advantage to visually impairedvisually impaired users or users or people whose eyes are already overloaded with visual people whose eyes are already overloaded with visual information (like pilots, drivers, etc) information (like pilots, drivers, etc)

7

Last and Gorelik (BGU)

Related WorkRelated WorkSonification of Time Series DataSonification of Time Series Data

Combining visual and audio information Combining visual and audio information ((NoirhommeNoirhomme--FraitureFraiture et alet al., 2002; Nesbitt and ., 2002; Nesbitt and BarrassBarrass, 2004), 2004)DomainDomain--specific tools (Elspecific tools (El--AzmAzm, 2005; , 2005; FarkasFarkas, , 2006; Walker 2006; Walker et alet al., 2006, 2007;)., 2006, 2007;)An Interactive Sonification Toolkit (An Interactive Sonification Toolkit (PaulettoPaulettoand Hunt, 2004)and Hunt, 2004)No empirical evaluation of using sonification by No empirical evaluation of using sonification by real usersreal users for for interactive data mining tasksinteractive data mining tasks

8

Last and Gorelik (BGU)

Goal and ContributionsGoal and Contributions

Research GoalResearch GoalTo develop and evaluate a sonification technique for time To develop and evaluate a sonification technique for time series databases so that humans can perform interactive data series databases so that humans can perform interactive data mining tasks mining tasks without the needwithout the need to view the actual data.to view the actual data.

Unique ContributionsUnique ContributionsUsing a Using a segmentation algorithmsegmentation algorithm to represent highto represent high--speed and speed and noisy time series in a noisy time series in a compressed formcompressed formExtracting some basic features of the Extracting some basic features of the Western tonal musicWestern tonal music like like melodymelody, , rhythmrhythm, , dynamicsdynamics, and , and timbretimbreConducting two user studies that show that by using music Conducting two user studies that show that by using music representation only, subjects can perform some basic data representation only, subjects can perform some basic data mining tasks "on the flymining tasks "on the fly““

9

Last and Gorelik (BGU)

The Sonification ProcedureThe Sonification ProcedureMain StepsMain Steps

Segmentation Algorithm Segmentation Algorithm Based on SWAB (Sliding Windows and Bottom Up) by Based on SWAB (Sliding Windows and Bottom Up) by Keogh Keogh et alet al. (2003). (2003)

SEGSEG-->MSQ Sonification Algorithm (translating a >MSQ Sonification Algorithm (translating a segmented database to the MSQ format)segmented database to the MSQ format)

The algorithm is applied separately to each attribute (time The algorithm is applied separately to each attribute (time series) in the given database (the series) in the given database (the ““verticalvertical”” mode) mode) Recommendation: up to 3Recommendation: up to 3--4 continuous and 14 continuous and 1--3 nominal 3 nominal attributes attributes The output is merged together by the The output is merged together by the mergeMSQmergeMSQ algorithmalgorithm

MSQMSQ-->MIDI / MP3 Translation>MIDI / MP3 Translation

10

Last and Gorelik (BGU)

The Vertical Mode of SonificationThe Vertical Mode of Sonification

Sonification of continuous dataSonification of continuous dataRepresentation by Representation by changes of music changes of music pitchpitchRepresentation by Representation by changes of volumechanges of volume (multivariate (multivariate time series only)time series only)

Sonification of nominal dataSonification of nominal dataAttributes are represented by different Attributes are represented by different musical musical instrumentsinstruments (usually some drums or bells) (usually some drums or bells) Attribute values mapped to different musical Attribute values mapped to different musical pitchespitchesOnly one (Only one (““defaultdefault””) value may be sonified ) value may be sonified

11

Last and Gorelik (BGU)

Sonification Examples 1Sonification Examples 1PitchPitch--Based SonificationBased SonificationUnivariate Continuous Time SeriesUnivariate Continuous Time Series

A river water level every half

a year (20 observations)

Daily closing price of a stock

(20 observations)

Financial time-series: Weather time-series:

12

Last and Gorelik (BGU)

Sonification Examples 2 Sonification Examples 2 PitchPitch--Based SonificationBased SonificationUnivariate Continuous Time SeriesUnivariate Continuous Time Series

Currency exchange rate

(73 observations)

Financial time-series:

Daily closing price of a stock (125 observations)

13

Last and Gorelik (BGU)Sonification Examples 3Sonification Examples 3PitchPitch--Based SonificationBased SonificationMultivariate (Continuous + Discrete) Time Multivariate (Continuous + Discrete) Time SeriesSeries

January 2007April 2007

Weather time-series database:

Weather observations are a typical example of complex multivariate time series data that may include both continuous numeric observations (temperature) and discrete “event” data (e.g. rain and snow episodes).

0

10

20

30

40

50

60

70

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Tempe

rature

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Rain Events

0

5

10

15

20

25

30

35

40

45

50

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Tempe

rature

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Snow

 Events

14

Last and Gorelik (BGU)

July 1978 min and max temperature

Summer 2007 vs. average temperature 2003-2006

Weather time-series databases

Weather observations may include a number of continuousnumeric observations

Sonification Examples 4Sonification Examples 4PitchPitch--Based and VolumeBased and Volume--Based SonificationBased SonificationMultivariate Continuous Time SeriesMultivariate Continuous Time Series

15

Last and Gorelik (BGU)

Empirical Evaluation Empirical Evaluation -- QuestionsQuestions

Can people use our sonification algorithm to Can people use our sonification algorithm to detect similarity of two time series?detect similarity of two time series?Can people hear and find consistent patterns in Can people hear and find consistent patterns in the data?the data?How do people interpret this representation and How do people interpret this representation and use it to make decisions?use it to make decisions?Can people perform standard data mining tasks, Can people perform standard data mining tasks, like classification and clustering using our like classification and clustering using our auditory representation of the data?auditory representation of the data?

16

Last and Gorelik (BGU)

11--st Experiment: Design & Variablesst Experiment: Design & Variables

InputInputUnivariate time series: one continuous attribute in each Univariate time series: one continuous attribute in each database. database.

AlgorithmAlgorithmPitchPitch--based sonification based sonification

Experiment web site (in English and Hebrew)Experiment web site (in English and Hebrew)http://http://iui.bgu.ac.iliui.bgu.ac.il/projects/sonification//projects/sonification/

Subjects: 44 (27 males and 17 females)Subjects: 44 (27 males and 17 females)Data privacyData privacy

The subjects were not asked to provide any identifying The subjects were not asked to provide any identifying informationinformation

17

Last and Gorelik (BGU)

Test StructureTest StructureIntroductionIntroduction

A brief explanation of our approach to sonification of time A brief explanation of our approach to sonification of time series data. series data.

Training sessionTraining sessionExamples of typical questions and their solutionsExamples of typical questions and their solutions

Personal and demographic informationPersonal and demographic informationUsername (fictional), age, gender, occupation.Username (fictional), age, gender, occupation.

Musical abilityMusical abilityMusical experience (number of practice years, musical pitch Musical experience (number of practice years, musical pitch ability).ability).

Evaluation sessionEvaluation sessionFive sets of questions, related to the following data mining Five sets of questions, related to the following data mining tasks: trend detection, classification, and similarity searchtasks: trend detection, classification, and similarity searchMaximum total score for the test: 20 points Maximum total score for the test: 20 points

18

Last and Gorelik (BGU)

11--st Experiment: Main Resultsst Experiment: Main Results• The answers to all questions were significantly better than

answers given by chanceNo statistically significant difference between total scores for different:

age groups (p = 0.71, p > 0.05)gender groups (p = 0.13, p > 0.05)occupation groups (p = 0.77, p > 0.05)

There is a statistically significant difference between total scores for different musical experience groups (p=0.001, p<0.05)

The number of years does not matter!

19

Last and Gorelik (BGU)

22--nd Experiment: Design & nd Experiment: Design & VariablesVariables

InputInputMultivariate time series: one continuous and one nominal attribuMultivariate time series: one continuous and one nominal attribute or two te or two continuous attributes in each database. continuous attributes in each database.

AlgorithmsAlgorithmsPitchPitch--based sonification of nominal and continuous attributes based sonification of nominal and continuous attributes VolumeVolume--based sonification of continuous attributes based sonification of continuous attributes PitchPitch--based sonification of continuous attributes, whereas each attribbased sonification of continuous attributes, whereas each attribute ute is represented by a different instrumentis represented by a different instrument

InstrumentsInstrumentsPiano and flute timbres to represent continuous attributesPiano and flute timbres to represent continuous attributesBell sound to represent the nominal attribute Bell sound to represent the nominal attribute

Subjects: 37 (21 males and 16 females)Subjects: 37 (21 males and 16 females)

20

Last and Gorelik (BGU)

• Our technique is better than random decisions for each of the questions

• The average number of correct answers is more than 2/3 (66%)

no statistically significant difference between total scores for different:

age groups (p = 0.75, p > 0.05)

gender groups (p = 0.64, p > 0.05)

occupation groups (p = 0.21, p > 0.05)

22--nd Experiment: Resultsnd Experiment: Results

21

Last and Gorelik (BGU)

Analysis of Occupation attribute Analysis of Occupation attribute –– 2 reorganized groups:2 reorganized groups:

There is statistically significant difference between total scores for two different reorganized groups of occupation attribute (p = 0.005, p < 0.05)

22--nd Experiment: Evaluationnd Experiment: Evaluation

22

Last and Gorelik (BGU)

Analysis of Musical Experience attribute:Analysis of Musical Experience attribute:

There is statistically significant difference between total scores for different musical experience groups (p=0.00005, p<0.05)

22--nd Experiment: Evaluationnd Experiment: Evaluation

23

Last and Gorelik (BGU)

Analysis of Musical Hearing Ability attribute:Analysis of Musical Hearing Ability attribute:

There is statistically significant difference between total scores for different musical hearing ability attribute groups (p=0.0005, p<0.05)

22--nd Experiment: Evaluationnd Experiment: Evaluation

24

Last and Gorelik (BGU)

Summary & ConclusionsSummary & Conclusions

A domainA domain--independent sonification technique for independent sonification technique for representing univariate and multivariate TSDBrepresenting univariate and multivariate TSDBDesigned to be used for interactive data mining tasks Designed to be used for interactive data mining tasks (not only)(not only)Using a segmentation algorithm to compress the Using a segmentation algorithm to compress the original dataoriginal dataEmpirical evaluation: two user studiesEmpirical evaluation: two user studiesFuture researchFuture research

Finding the best settings of the sonification algorithm for Finding the best settings of the sonification algorithm for each data mining task and user typeeach data mining task and user typeDeveloping an Developing an ““onlineonline”” version of the proposed algorithmversion of the proposed algorithm