Machine learning methods for the market segmentation of the performing arts audiences

356 Int. J. Business Environment, Vol. 2, No. 3, 2009

Copyright © 2009 Inderscience Enterprises Ltd.

Machine learning methods for the market segmentation of the performing arts audiences

María M. Abad-Grau* ETS Ingeniería Informática Departamento de Lenguajes y Sistemas Informáticos Universidad de Granada Granada 18071, Spain Fax: +34958243179 E-mail: [email protected] *Corresponding authors

Mária Tajtáková Faculty of Commerce Department of Marketing University of Economics in Bratislava Dolnozemská cesta 1, 852 35 Bratislava, Slovakia Fax: +421 2 624 12 302 E-mail: [email protected]

Daniel Arias-Aranda* Facultad de Ciencias Económicas y Empresariales Departamento de Organización de Empresas Campus de Cartuja s/n Universidad de Granada Granada 18071, Spain Fax: +34958246222 E-mail: [email protected] *Corresponding authors

Abstract: The interaction of human experts with machine learning and data mining tools leads to improved results in decision-making support systems. In marketing decisions related to market segmentation, the use of only one technique does not guarantee an optimal solution, as such a solution may not even be achievable. In this paper, we analyse the market segmentation decisions in the performing arts through a combination of expert opinions and machine learning algorithms in order to obtain a consensual model that allows a better understanding of market preferences together with a deep knowledge about reliability in the obtained results. The results and data were applied to build a model of market segmentation of students based on their attendance in, attitudes towards, and intentions in attending opera and ballet performances.

Machine learning methods for the market segmentation 357

Keywords: market segmentation; machine learning; data mining; performing arts; opera; ballet; business environment.

Reference to this paper should be made as follows: Abad-Grau, M.M., Tajtáková, M. and Arias-Aranda, D. (2009) ‘Machine learning methods for the market segmentation of the performing arts audiences’, Int. J. Business Environment, Vol. 2, No. 3, pp.356–375.

Biographical notes: María M. Abad-Grau is an Assistant Professor of Computer Science at the University of Granada, Spain. In 2001, she received a PhD in Computer Science from the University of Murcia (Spain). She has been a Visiting Scholar under a Fulbright grant in projects related to Machine Learning applied to Genomics at the University of Massachusetts and Boston University in 2003 and 2004. Her main research interests are machine learning, information systems and bio-informatics.

Mária Tajtáková is an Associate Professor of Marketing at the University of Economics in Bratislava (Slovakia). She did her PhD in Arts Marketing and its Application in Opera Houses in Slovakia. Her current research focuses on consumer behaviour and audience development in the arts and culture. She worked as a Marketing Manager for Opera and Ballet at the Slovak National Theatre in 2003 and 2004 and as a Consultant of the Ministry of Culture of the Slovak Republic in 2005 and 2006.

Daniel Arias-Aranda is a Professor of Business Management at the University of Granada (Spain). He has also been an Associate Professor at the Complutense University of Madrid, where he obtained his PhD in Economics and Business Management. His main lines of research are service operations management and business simulation.

1 Introduction

Market segmentation has been described as a process of dividing the market into internally homogeneous groups which appear distinct with respect to the other groups. In essence, the market segmentation approach recognises that the total market demand for arts offerings is essentially heterogeneous and, therefore, it can be disaggregated into segments with different needs and preferences (Clopton and Stoddard, 2006). Since performing arts organisations tend to apply a product-centred marketing approach (Colbert et al., 1994), the aim of segmentation is to identify those market segments which would be most susceptible to their actual offerings and distinguish them from those who might be reached only by using more challenging marketing techniques.

In this context, machine learning, a wide subfield of artificial intelligence, is concerned with the development of algorithms and techniques that allow computers to ‘learn’. An algorithm or a learning machine able to learn from data will extract rules and patterns from data sets. Machine learning has a wide spectrum of applications, including bio-informatics, medical diagnosis, natural language processing, speech and handwritten recognition, object recognition in computer vision, etc. In economics, finance and marketing, there are several applications, such as stock market analysis, loan approvals,

358 M.M. Abad-Grau, M. Tajtáková and D. Arias-Aranda

detecting credit card fraud, detecting tax fraud or even market segmentation. However, a simple match between a practical problem in marketing and an applied methodology in machine learning is still far away from being a spread reality (Cui et al., 2006). For marketing-related purposes, some tasks can be decomposed into subtasks, allowing a wider variety of matches between practical problems and (combinations of) methods (Someren and Urbancic, 2005). The differences in data types, data volumes and cost requirements may affect the methods that suit the problem. The data bias in these cases is the most important variable affecting the accuracy of a method. Therefore, the nature of the problem itself imposes a restriction on the performance of an applied method. Choosing a method whose hypothesis space is large enough to contain a solution to the problem and yet small enough to ensure reliable generalisation from reasonably-sized training sets is really a challenge in the marketing area. Although there are some proposals to automatically learn the bias (Baxter, 2000), they lie on strong assumptions that rarely hold so that it is usually supplied by hand through the skill and insights of marketing experts. This research intends to apply machine learning techniques to the market segmentation of the performing arts audiences as a specific application.

In order to achieve such a goal, this paper initially describes the different segmentation approaches for the performing arts audiences. Furthermore, we address how different learning-from-data algorithms constituting the inference engine of a Decision Support System (DSS) or expert system can support the task of market segmentation. To do that we considered differential features of this task compared with other decision-making tasks. Afterwards, we design a four-step strategy in order to obtain an accurate market segmentation model by using only machine learning mechanisms. We further apply this strategy to the sample of 800 students who were questioned about their attitudes towards attending opera and ballet performances. As a result, we provide a segmentation model that can be used by performing arts organisations for reaching non-attending groups. The main conclusions are provided in the last section.

2 Segmenting the performing arts audiences

Several segmentation models have been developed for grouping the audiences of the performing arts. Despite a clear conceptual principle of segmentation, Colbert et al. (1994) pointed out problems in the practical application of the segmentation approach within the arts market. The main errors may reside in the insufficient analyses of the market structure involving an assumption that the market is segmented when in reality, it is not and vice versa. However, similar to other sectors, four basic segmentation variables are used in order to divide the arts market: geographic variables, sociodemographic variables, psychographic variables and behavioural variables (Kotler, 1972; Hill et al., 1997).

Some authors proposed the implementation of additional segmentation criteria such as benefit segmentation (Haley, 1968; Colbert et al., 1994; Kotler and Scheff, 1997), frequency-of-participation segmentation (Colbert et al., 1994; Kotler and Scheff, 1997; McCarthy and Jinnett, 2001) and brand/product loyalty segmentation (Kotler, 1972; Colbert et al., 1994; Kotler and Scheff, 1997; Hayes and Slater, 2002). Actually, the three complementary segmentation techniques highlight and develop different aspects within the behavioural approach, which will be discussed further.


2.1 Geographic segmentation

Since the consumption of performing arts is mostly bound to a particular venue, geographic segmentation may be seen as a fairly natural segmentation approach for performing arts organisations. Hence, audiences may be grouped according to their geographical proximity and access to the arts venue. However, with a development of new information technologies enabling online ticket purchases, new and more economic transport possibilities and eventually, a satellite TV transmission of performances, the geographical catchments area for the performing arts has been considerably enlarged.

2.2 Sociodemographic segmentation

Within sociodemographic segmentation, audiences are divided in terms of different ages, sexes, incomes, education, races, professions, etc. Considering the influences on arts attendance, some sociodemographic factors have to be highlighted as predictors of participation. For instance, Colbert (2003) claimed that the typical consumer of high arts is female, is well educated, earns a relatively high income and holds a white-collar job. In particular, the educational level appears to be most closely related to arts attendance (DiMaggio et al., 1978; Colbert et al., 1994; Kotler and Scheff, 1997; Hill et al., 1997). Although sociodemographic segmentation may be useful, when used alone, it is not sufficient to effectively segment the performing arts market. Therefore, Hill et al. (1997) suggested the exploration of the sociodemographic variables alongside the attitudes and behavioural characteristics of audiences.

2.3 Psychographic segmentation

The psychographic segmentation method implies audience groupings based on personality lines and lifestyles. It involves an analysis of the psychological characteristics of audiences, their attitudes, values and opinions. One of the first psychographic market segmentation models intended for the arts market (the Audience Development Arts Marketing (ADAM) model) was designed by Diggle (1984). It divides people in terms of their behaviour and attitudes towards the arts into ‘available audience’ and ‘unavailable audience’. The available audience is defined as being ready to make a physical commitment to obtain the arts experience that an organisation has for them. It comprises ‘attendees’, those who are presently experiencing the offering of an arts organisation and ‘intenders’, whose attitude towards the offering is favourable but have not yet been persuaded to make a commitment. The ‘unavailable audience’ encompasses those who are ‘indifferent’ or ‘hostile’ and who cannot be reached, motivated and/or turned into customers until they have changed their attitude towards the arts performances.

Other authors combined the psychographic approach with some behavioural variables. For instance, Hayes and Slater (2002) provided a framework for the classification of audiences based on their level of attitudinal (positive vs. negative) and behavioural (high vs. low attendance) loyalty, creating a map of audience development potential. Similarly, a model by Tajtáková and Arias-Aranda (2008) combines interest (attitudinal) and attendance (behavioural) variables to identify four different segments within an arts market: ‘interest/attendance’, ‘interest/no attendance’, ‘no interest/


attendance’, ‘no interest/no attendance’. The model aims to distinguish between those who are interested and do participate (current audience) or do not participate (due to lacking abilities or opportunities) and those who are not interested but still attend arts events because of several reasons (e.g., to accompany a partner, friend, family member/s; if one gets tickets as a gift, etc.) or do not attend (indifferent/hostile).

2.4 Behavioural segmentation

The behavioural approach focuses on attendance habits, answering the following questions: When, why, under what circumstances, with what knowledge, how and how often do people attend arts events? Within these concerns, Colbert (2003) emphasised two questions being of utmost interest to marketing experts in the arts: Why do people attend or do not attend the arts and, among those who do not, what can be learned about the differences in motivations?

Several empirical studies were conducted in order to reveal consumer behaviour and the preferences related to the performing arts. Bergadaà and Nyeck (1995), Bouder-Pailler (1999) and Cuadrado and Mollà (2000) aimed to identify those factors which motivate people to attend performing arts events and proposed models for audience classification according to the goals of their attendance. The authors consistently pointed out the entertaining, intellectual and social aspects of motivation among arts audiences.

Surprisingly, research on the barriers to participation faced by non-attending groups is much less represented in the literature than research on the motivations. The works in this field comprise the Research ANd Development (RAND) Corporation model (McCarthy and Jinnett, 2001) that analyses perceptual factors (such as personal beliefs about participating in the arts, perceptions of the arts by one’s reference groups, etc.) and practical factors (such as the price, date and location of the event), both of which may create barriers to attendance. Further, the Motivation Ability Opportunity (MAO) model (Wiggins, 2004) highlights three types of barriers which determine the likelihood of participation: lacking motivation, ability and opportunity to participate or some combination of these. Other studies focused on the barriers perceived by student audiences in attending mainly high arts events (Kolb, 1997; Tajtáková et al., 2005).

2.5 Benefit segmentation

Benefit segmentation consists in dividing a market not on the a priori notions of different groups, but rather in relation to the various benefits that the buyer may be seeking from the particular product (Yankelovich, 1964). Turrini (2002) pointed out that the choice to participate in the arts by different means is not causal, but is connected to the benefit that one is looking for when deciding to attend an arts event. Benefit segmentation is closely related to the behavioural approach reconsidering the question ‘Why do people attend the performing arts?’ Nevertheless, in this case, marketing practitioners would focus on anticipated benefits as perceived by arts consumers rather than on their a priori motivations. In essence, the benefit segmentation approach attempts to group consumers who seek the same benefits from the same product (Colbert et al., 1994).


The benefit segmentation model suggested by Kotler and Scheff (1997) distinguishes between ‘quality buyers’ seeking out the best reputed offerings, ‘service buyers’ who are sensitive to the services provided by the organisation and ‘economy buyers’ who favour the least expensive offers. A different approach was provided by Botti (2000) who summarised the benefits related to arts consumption as functional (cultural) benefits, symbolic benefits, social benefits and emotional benefits. However, Colbert (2003) emphasised that these categories are not exclusive and an individual’s main motivation may put him or her in one category while some of the benefits that he/she seeks may be associated with another category.

2.6 Frequency-of-participation segmentation

Under the frequency-of-participation approach, the market for cultural products is divided according to the relative rate of consumption (Colbert et al., 1994). Considering the 80:20 rule, Kotler and Scheff (1997) suggested that 80% of the purchases in the arts market are made by 20% of the buyers usually referred to as frequent or heavy users, while the remaining 80% encompasses the light users or non-users. Frequent users pay considerable attention to their leisure activities, accepting the arts as an important part of their lives. On the other hand, light users have a tendency to attend one performing arts organisation only or attend a few organisations sporadically. Non-users appear to attribute different characteristics to the same products in comparison to users. As a result, these attributed features may prevent non-users from participation.

However, McCarthy and Jinnett (2001) advocated a combined approach towards segmentation, employing the frequency-of-participation method together with the perceptual and practical factors influencing a predisposition to participate in arts events. They distinguished three market segments within arts audiences: those who are not inclined to participate, those who are inclined to participate but are not currently doing so and those who already participate.

2.7 Brand/Product loyalty segmentation

Brand/Product loyalty segmentation relates to the strength of commitment to an arts organisation, which leads to the preference of one provider in spite of increased incentives to switch (Hayes and Slater, 2002). Regarding a degree of loyalty among arts audiences, a basic distinction can be made between single ticket buyers and subscribers. These two groups may be described in view of the advantages offered to their members. Being a subscriber entails a discount on the cost per performance while ensuring tickets for chosen productions. However, it also brings a higher risk of not being able to participate in the preselected performances and the risk of attending a bad performance, since limited information about the quality of production is available before the season. On the other hand, single ticket purchases provide more flexibility and require no precommitment (Corning and Levy, 2002), but encompass a risk of unavailability of tickets for desired performances and/or preferred dates. A more precise segmentation can be accomplished by adding frequency-of-attendance criteria to the loyalty approach, which combines behavioural and attitudinal loyalty (Hayes and Slater, 2002) and enables to distinguish three attendance level segments: infrequent attendees, frequent attendees and subscribers (Semenik and Young, 1997).


3 Machine learning and market segmentation

Algorithms and architectures that learn from observed data are continually developed in machine learning research. Learning algorithms can learn similarity patterns from data in an ‘unsupervised’ way. Hence, for the case of marketing research, when trying to decompose the data sets in groups or clusters without any predefined criteria, learning algorithms may not produce the intended results of performance for market segmentation, as they cannot use any marketing response or performance information. When some information is used by an algorithm in order to identify similarity patterns, the learning process is referred to as ‘supervised learning’. The simplest function to be learned by a supervised learning algorithm is a classifier, a function that returns a class value for each configuration of a set of input variables, i.e., a value of a discrete variable called the ‘class variable’. When the function returns a continuous variable, i.e., the function range is continuous, it is not called a classifier, but a ‘regression’ function. In this study, we will focus on this type of classification functions which, with an appropriate approach, are better understood and decomposed by humans.

Data mining applies machine learning techniques in order to extract useful information from large data sets or databases. In marketing research, it is especially useful for analysing the relationships in purchase intentions for different products and services (Crone et al., 2006). This can help marketing practitioners locate purchase-related products nearby or offer related services in the same package. The information about which and in which degree a set of variables contributes to predict a class variable is not required for data mining to be applied. An expert system is able to predict the failure risk in a new product launch, considering hundred of variables without providing an interpretable model (Sun, 2006). This is the case of neural networks algorithms, a nature-based learning approach for building black-box models, as they cannot be understood by humans. While black-box models are perfectly suited for classification tasks, for instance, the risk prediction of personnel credit in a bank or the risk of death of a patient in a cancer hospital, they cannot be used when the model itself needs to be known. Another black-box learning approach, known as ‘instance-based learning’ (Aha et al., 1991), has among its main features a very low computational cost. The k-nearest neighbour algorithm, with k being an integer number greater than 0, must be mentioned as an example because of its simplicity. It returns for each pattern the most common class value among its k closest neighbours, i.e., the set of k patterns with the shortest distance to the pattern to be classified. Different measures of distance can be used in this case.

In our study, the market segmentation process can be performed by using machine learning techniques. However, as variable interaction defines different segments, only white-box methods are able to learn classifiers by providing an interpretable model to humans. Nevertheless, other machine learning techniques, such as feature selection or feature extraction, are applicable as a pre-processing step previous to black-box classifier-builders. That information can be extracted through a set of variables affecting the class and ordered by their influence while black-box models may still have an important role as part of a more complex strategy.

Perhaps the most widely used white-box learning algorithms to build classifiers are those building decision trees and those building Bayesian Networks (BNs). A decision tree is a visual and analytical decision support tool where the expected values of competing alternatives are calculated. Figure 1 shows an example of a very simple


decision tree to assist in the decision-making process about paragliding. The input variables are ‘wind speed’, ‘pilot experience’ and ‘quality of the paraglide’. Each leaf node represents a decision, with only two different decisions: ‘to fly’ or ‘not to fly’. Although the expected values (usually likelihoods or probabilities) are not provided in the figure, they must also be computed by the algorithm for each leaf node.

Figure 1 A decision tree for decision-making support in paragliding (see online version for colours)

Note: The expected values for each decision or leaf node (‘to fly’ or ‘not to fly’) do

not appear in the example. The decision nodes are coloured in orange, while the leaf nodes in pale blue.

A BN consists of (1) a Directed Acyclic Graph (DAG), where each node represents a random variable and arcs represent the probabilistic dependencies between these variables being part of the network known as the structure, the model itself or the qualitative part of the BN (2) a conditional probability distribution of the form P(x | πx) for each node x, given its parent set πx. This part of the BN is called the parameters or the quantitative part of the network:

P (x1, x2, …, xn) =∏ i=1..n P (xi | πxi).


If a BN is defined only to assign values for a discrete variable, then the class, given a set of attribute values, works as a classifier. These Bayesian classifiers are applied as part of a DSS in financial companies for decision-making processes related to credit card applications and/or loan approvals. Attributes such as the amount of money currently in a checking bank account versus the salary assignment, credit records, seniority in the same employment, personal status or loan purpose are used for classificatory purposes.

One of the most effective BN classifiers, in spite of its simplicity, is the Naive Bayes (NB) classifier. The model of the BN used by this classifier makes a strong independence assumption: all the attributes x1, x2,…, xn are conditionally independent, given class y, as shown in Figure 2(a). Figure 2(b) shows the structure of an NB classifier for the DSS about a credit authorisation with four input variables. The NB algorithm was used among other applications to learn a Bayesian classifier from a data set with information about 690 credit card applications in an Australian bank, called crx (Blake et al., 1998). Each application contained 15 categorical and continuous attributes. The NB algorithm achieved 83% generalisation accuracy, measured by using a five-fold cross-validation. This implies that an erroneous decision was made in 17 of 100 credit card applications. Accuracy increased up to 88% when only the most relevant input attributes were chosen by using a wrapper feature selection algorithm (John et al., 1994). More sophisticated models have been defined and tested in the machine learning literature. One of these models, called Augmented Naive Bayesian (ANB) networks (Friedman et al., 1997), allows edges among the attributes, thus reducing the strong assumptions existing in the NB classifier.

Figure 2 (A) A graph for the NB classifier with four input features. (B) The structure of an NB classifier for the credit example

(A)

(B)


4 Developing a strategy for automatic market segmentation

While the quantitative part of a Bayesian classifier can compute the posterior probabilities for the class given a configuration for all or some of the input variables, the graph can provide information about independence among the variables. However, it seems intuitive that a decision tree can model a market segmentation in a much direct way, as a hierarchical segmentation can directly be read from the tree. Looking only at the first level of the hierarchy, we will obtain a very simple segmentation attending to only one variable. If no more variables are included in the tree, only the root variable will explain the segmentation. Let us suppose that an algorithm is used to build a decision tree for the market segmentation of car consumers. The class variable has four different values: ‘sport car’, ‘family car’, ‘wagon’ and ‘4 wheel wire car (4WW)’, depending on the preference of a car consumer. The input variables to use for learning the model include sex, age, marital status, incomes, profession, etc. If the resulting decision tree only contained the ‘incomes’ variable, it would mean that incomes are enough in order to predict car consumer preferences. If a more complex decision tree were provided, a publicity campaign should be designed that takes into account the other variables in the tree. According to the decision tree shown in Figure 3, a major marketing decision to encourage the purchase of 4WW cars should be targeted to either over 40-year-old men or younger but married men or to professional sports people (see Figure 3).

Figure 3 A decision tree for the market segmentation of car consumers (see online version for colours)

Note: The decision nodes are coloured in orange, while the leaf nodes are in pale blue.


However, in order to assess the robustness of a decision tree, an accuracy contrast test with other classifiers (either back or white-box models) has to be performed. If the accuracy of the decision tree is significantly lower, different algorithms for decision tree building or even different approaches should be considered. Moreover, at least the variables that most affect the class, i.e., those in the highest levels of the tree, should also be considered as important for other models learned by using different approaches.

Thus, in order to assess the reliability of a decision tree as a model for market segmentation, we propose the following four-step strategy:

Step 1 Inclusion of sample sets with as much information as available, considering the variables obtained through different current expert human models about the market segmentation of a population.

Step 2 Use algorithms to learn the classifiers from different approaches and compute their predictive accuracy for a given data set. At least one algorithm based on decision trees is used. If the accuracy levels are not satisfactory in all algorithms, even in those known to be robust for superfluous attributes, the data set is enhanced by increasing the number of instances and/or the number of variables, as either there are not enough instances or those variables with the strongest association with the class in this population are not included.

Step 3 If the predictive accuracy for the decision tree algorithm is satisfactory, a decision tree is generated by using all the instances in the sample being adopted as a model of a hierarchical segmentation.

Step 4 Utilisation of a feature selection or feature extraction algorithm for all the learning algorithms used in Step 2. When a feature selection algorithm is used, the set of input variables is ordered depending on the degree of association with the class drawn by the association measure used by the algorithm. If there is no agreement in the variables in association with the class, especially between the decision tree and any other model, and considering that the accuracy of the decision tree is lower than the one reported by other models those branches corresponding to the variables in disagreement should be pruned by the expert in order to improve or even develop a new model.

5 Empirical evaluation

In this section, we describe the sample and data sets used. Second, we refer to the algorithms and technical issues related to the application of the four-step strategy described in the previous section. Finally, the accuracy results and the segmentation model that was created by using this strategy are presented with the final conclusions.

5.1 The sample

The used sample was obtained from a survey among university students aged mostly from 18 to 26 years during the spring term of 2004 in Bratislava (Slovakia). The sample consisted of 800 individuals from different fields of study. The study was conducted by


the Slovak National Theatre Bratislava in cooperation with the University of Economics in Bratislava. The respondents were personally interviewed through a standardised questionnaire. The questionnaire was divided into five parts:

1 their associations and attitudes towards opera and ballet

2 their attendance and intentions to attend at opera and ballet performances

3 their motivations, barriers and expectations

4 their knowledge of the repertory and admission prices for students

5 demographic variables.

Before generating its final version, the questionnaire was tested on a small sample. The demographic profile of the respondents is consistent with the demographic

structure of students at Slovak universities. In the academic year 2003/2004, there were 101 429 individuals pursuing their studies at the universities in Slovakia and 51 359 (50.64%) of them were women. In the survey, females comprised 50% of the sample. Regarding the age, the students between 18 and 21 years represented 50.25% of the respondents and those between 22 and 26 comprised 45.25% of the sample. There were 0.50% of students under 18 and 4.00% over 26. The majority of universities in Slovakia offer a five-year study programme in undergraduate studies and only a few of them offer six-year studies. A distribution of the respondents according to their grade at the universities was as follows: 16.75% in the first year of studies, 22.38% in the second year, 21.62% in the third, 27.25% in the fourth, 11.00% in the fifth and 1.00% in the sixth year.

5.2 Methodology

The four-step strategy explained in the previous section was applied. To accomplish the first step, the survey was designed in order to include a variety of variables used by the different current models of marker segmentation. From the survey, two different data sets were obtained: one for the market segmentation of the attendees/non attendees to opera performances (Opera) and the other for the market segmentation of the attendees/non attendees to ballet performances (Ballet). Variables 8 to 11 in the survey were disregarded, as they should be answered depending on the answer to previous questions. Moreover, some questions in the survey referred to the opera while the others, to the ballet. Taking all these issues into account, each data set was finally made up by 18 variables (see Table 1).


Table 1 The variables used in the opera (O) and ballet (B) data sets

ID Name Type Values

1 Last time attending an O/B performance

Input, ordinal Never (1), not this year (2), in the past 12 months (3), last month (4), this month (5)

2 Know the repertory Input, ordinal Not at all (1), only some pieces (2), I know approximately what’s on (3), well (4)

3 Attitude Input, ordinal Not interested and do not attend (1), not interested but sometimes I attend a performance (e.g., to accompany a partner, friend or a family member, if I get tickets as a gift, etc.) (2), interested but I have not had an opportunity to attend yet (3), interested and do attend (4)

4 Expectations about entertainment Input, ordinal 1, 2, 3, 4, 5 *

5 Expectations about relaxation Input, ordinal 1, 2, 3, 4, 5 *

6 Expectations about emotional experience

Input, ordinal 1, 2, 3, 4, 5 *

7 Expectations about new incentives – inspiration


8 Expectations about educational development


9 Expectations about broadening my scope in culture


10 Expectations about atmosphere (of the venue – event, etc.)


11 Visited an opera performance abroad

Input, nominal yes (1), no (0)

12 Gender Input, nominal female (1), male (2)

13 Age Input, ordinal 1–18 years old (1), 18–21(2), 22–25(3), 26 and more (4)

14 Study grade Input, ordinal 1, 2, 3, 4, 5, 6

15 Faculty Input, nominal Economics, humanities, life sciences, polytecnics

16 Population (per thousand inhabitants)

Input, ordinal City over 100 (1), town from 50 to 100 (2), town from 5 to 50 (3), village to 5 (4)

17 Region Input, nominal Bratislavský, Trnavský, Trenčiansky, Nitriansky, Žilinský, Banskobystrický, Prešovský, Košický, foreign country

18 Wish to go in the future Class, nominal Yes (1), no (0)

Note: (*) 1 means very important and 5 means not important at all.

For the second step, the variable that was chosen as the class or predicted variable was a binary variable derived from question number 4 in the survey: ‘Do you wish to attend an opera/ballet performance in the future?’ We used three different machine learning approaches to obtain the predictive accuracy for both data sets:


1 Decision trees: We used C4.5 (Quinlan, 1994), an algorithm for building decision trees that usually achieve high accuracy levels to build classifiers from very different populations.

2 BNs: We used NB, a very simple but very efficient Bayesian classifier. We considered two different hyperparameters (Friedman et al., 1997): α = 0 and α = 1. In the first case, Maximum Likelihood (ML) is used to estimate each probability distribution. In the second case, a Bayesian estimation is used, with the prior being the marginal distribution for each variable.

3 Instance-based algorithms: We used the popular k-nearest neighbour with two different values for k = 1 and k = 5. The value of 1 is usually less robust than values a little bit larger.

Learning-from-data algorithms need a data set to infer the classifier. This data set is usually referred to as the training set. In order to measure the predictive or generalisation accuracy of the classifier, i.e., how well the classifier will perform with a new instance, an unused data set should be chosen. This data set is called the test data set. Usually, only one data set is provided. In order to reduce the variance, instead of splitting it into a training data set and a test data set, other solutions such as cross-validation are frequently applied. In cross-validation, the original data set is divided in f folds. The learning algorithm is used f times. Each time the algorithm is run, the test data set is composed of all the instances at fold t (t � { 1, 2, . . ., f }) and a different classifier can be inferred by using as a training data set all the instances in the remaining f-1 folds. The test accuracy is computed for each classifier. The reported predictive accuracy is the averaged test accuracy for the f folds used. In this study, five-fold cross-validation was applied, as it was referred to be a good tradeoff between efficiency and computational cost (Kohavi, 1995).

Once we obtained five different values of predictive accuracy and checked that the accuracy reported by C4.5 was satisfactory, we obtained the decision tree (Step 3) and went on to Step 4. In this step, we applied a forward and wrapper selection algorithm for those approaches without any embedded feature selection mechanism, i.e., NB and k-nearest neighbour algorithms. Wrapper selection means that the selection is used together with the learning algorithm. Actually, it ‘wraps’ the learning algorithm so that the criterion to select a variable is the generalisation accuracy of the learning algorithm when using the current selection of variables (John et al., 1994). C4.5 is the only one of these algorithms that includes a feature selection algorithm by pruning the tree. This is one of the reasons for the high accuracy that C4.5 usually achieves (Quinlan, 1994). We also checked whether there was agreement between the variables selected by C4.5 and any other algorithm.

6 Results

Table 2 shows the predictive accuracy values for the five algorithm configurations used. NB0 means NB with ML estimation, NB1, the Naive Bayes with Bayesian inference and α = 1 and 1 nn and 5 nn mean the k-nearest neighbour with k = 1 and k = 5, respectively. The highest accuracy level is achieved by C4.5, while 1nn obtained the lowest level. A predictive accuracy of 83.61% for the Opera data set and 82.39% for the Ballet data


means that with the data used as input variables, the decision tree learned by the C4.5 algorithm is able to correctly classify more than 80% of the new instances belonging to the same population. Figures 4 and 5 show the decision trees that were built by this algorithm when using the entire sample as a training set.

Table 2 The predictive accuracies for different algorithms

Algorithm Opera Ballet

C4.5 0.83610 0.82293

NB0 0.80052 0.79873

NB1 0.80561 0.80127

1nn 0.75737 0.71083

5nn 0.80180 0.79108

Note: The highest accuracy for each data set is shown in shadow.

Figure 4 A decision tree for the opera data set (see online version for colours)

Note: The whole data set was used in order to obtain this model.


Figure 5 A decision tree for the ballet data set (see online version for colours)

Note: The whole data set was used in order to obtain this model.

In Table 3, we show the predictive accuracy levels for all the algorithms when a wrapper feed-forward selection algorithm is applied. As the accuracy levels always improve, it can be assumed that some input variables were not inferring the class. The accuracy levels reported for C4.5 are the same as those shown in Table 2, as C4.5 already includes a selection algorithm. Table 3 also shows an ordered set of the selected variables. For both data sets, the third variable (Attitude) turned to be, out of all the algorithms, the first one to be selected, so its importance seems to be crucial in the segmentation of the sample for both opera and ballet performance. The other variables that were chosen by any other selection algorithm are number 16 (Population size) and 12 (Gender). The highest accuracy levels when selection was used in the Opera data set was achieved by NB1 and only Attitude was chosen. However, the accuracy from C4.5 was very close to the highest one.

In the Ballet data set, an overall agreement about Attitude was also found. In this case, again C4.5 achieved the highest accuracy levels, considering that the other algorithms used feature selection. The eighth variable (Expectation about education development) should be also considered highly important in the segmentation about Ballet attendance, as it was used by all algorithms with the highest accuracy levels. The sixth variable (Expectation about emotional experience) was also chosen by NB0 and NB1 as a highly crucial variable. These algorithms reported the second and third highest accuracy levels. The first variable (Last time attending an Opera/Ballet performance) was selected in both the Opera and Ballet data sets by C4.5, but not by any other algorithm. The agreement of C4.5 about this variable for both data sets gives more credibility to the decision trees that include this variable.


Table 3 The predictive accuracies with feature selection and an ordered set of selected variables

Opera Ballet

Algorithm Accuracies Ordered set of

selected variables Accuracies Ordered set of

selected variables

C4.5 0.83610 3, 1, 12, 13, 2, 16 0.82293 3, 8, 17, 6, 1, 5

NB0 0.83229 3 0.81911 3, 15, 14, 6, 8, 11

NB1 0.83739 3 0.82038 3, 15, 14, 6, 8, 11

1nn 0.77892 3, 16 0.81529 3

5nn 0.80813 3, 16, 12 0.82038 3, 8, 4

Note: The highest accuracy for each algorithm is shown in shadow.

Figure 6 A pruned decision tree for the ballet data set (see online version for colours)

According to the high accuracy results achieved by C4.5 in both data sets, we used decision trees in Figures 4 and 5 to segment the sample for their intentions to attend (leaves labelled ‘yes’) or not to attend (leaves labelled ‘no’) an opera and ballet performance, respectively. However, some branches could be pruned in case the model was going to be used for a wider population. Thus, those variables not located at the root of the tree (target subsegments) were not considered by any other learning algorithm (Variables 2 and 13 in the Opera data set and 5 and 17 for the Ballet data set). Anyway, they could be removed if they are not the predecessor of a more important variable. On one hand, in the Opera data set, neither Variable 2 (Know the repertory) or 13 (Age) can be removed, as they are both predecessors of the more important Variable 16 (Population size). On the other hand, in the Ballet data set, Variables 5 (Expectation about relaxation) and 17 (region) can both be removed, as the only descendant is the first variable (Last time attending), which has not be chosen by any other selection process. The resulting


pruned decision tree is shown in Figure 6. In order to know the value of the leaf node that substitutes the removed variable, the criterion by the learning algorithm must be used, i.e., that value with the highest score must be chosen. In the case of C4.5, the criterion is ML, so the branch with the highest frequency should be chosen. In the example, the branch with the highest frequency was the one ending with the leaf node ‘No’, meaning a non-attending group.

7 Conclusions

Machine learning methods applied to marketing issues turn into powerful tools for data mining with large noisy databases. These methods increase the possibilities for researchers to gain new insights into consumer preferences while improving the accuracy on the prospective and predictive models to attract audiences to the performing arts. However, there are still many decisions that cannot be taken in an automatic way. Those decisions refer, among others, to the selection of a convenient approach given the problem bias and the detection of noise levels that may recommend using and choosing an appropriate feature selection or extraction, depending once more on the nature of the problem to be solved. Moreover, the combination of more than one approach may result into a noticeable increment in knowledge about the insights of the problem. The main contribution of this study for the research community is represented by the fact that the application of wrapper selection and instance-based approaches increase the robustness of the decision tree developed by the decision-making support system. This has been proved to have distinct advantages in accuracy and explanatory insight for modelling potential audience response as a management decision-making tool. In addition, the main application of this paper for practitioners is related to the application of learning-from-data algorithms on an ‘unsupervised’ basis to marketing research in order to increase accuracy for market segmentation. By implementing the C4.5 algorithm for building decision trees, NB for BNs and the k-nearest neighbourhood for the instance-based algorithm, this study can be replicated and applied to other marketing and management issues related to DSS in different sectors.

In spite of this, the significant impact on most selection variables in the performing arts may need further research on the validity and reliability of the current method selection practices based of multivariate statistical methods. Hence, this study justifies the structured analysis of the sampling, coding and scaling processes in order to assure valid and reliable results of the performance of classification methods. In any case, the combination of different approaches increases the model validity and yields superior performance when compared to individual machine learning algorithms or other traditional content-based techniques (Tajtáková and Arias-Aranda, 2008) for not only the atmosphere of live performances, but also the bundled emotional experience. The significant variables provided through this study point out the chance to increase the educational activities provided by theatres focused on the present and future audience targets. Future research lines open broadly when considering the application of Bayesian combined methods on cross-cultural issues in order to foster attendance in opera and ballet in different geographical, cultural and social environments.


References

Aha, D.W., Kibler, D. and Albert, M.K. (1991) ‘Instance based learning algorithms’, Machine Learning, Vol. 6, pp.37–63.

Bergadaà, M. and Nyeck, S. (1995) ‘Quel marketing pour les activités artistiques: une analyse qualitative comparée des motivations des consommateurs et producteurs de théâtre’, Recherche et Applications en Marketing, Vol. 10, No. 4, pp.27–45.

Baxter, J. (2000) ‘A model of inductive bias learning’, Journal of Artificial Intelligence Research, Vol. 12, pp.149–198.

Blake, C., Keogh, E. and Merz, C.J. (1998) ‘UCI repository of machine learning databases’, http://www.ics.uci.edu/~mlearn/MLRepository.html (available November 2006).

Botti, S. (2000) ‘What role for marketing in the arts? An analysis of arts consumption and artistic value’, International Journal of Arts Management, Vol. 2, No. 3, pp.14–27.

Bouder-Pailler, D. (1999) ‘A model for measuring the goals of theatre attendance’, International Journal of Arts Management, Vol. 1, No. 2, pp.5–15.

Clopton, S.W. and Stoddard, J.E. (2006) ‘Event preferences among arts patrons: implications for market segmentation and arts management’, International Journal of Arts Management, Vol. 9, No. 1, pp.48–59.

Colbert, F. (2003) ‘Entrepreneurship and leadership in marketing the arts’, International Journal of Arts Management, Vol. 6, No. 1, pp.30–39.

Colbert, F., et al. (1994) Marketing Culture and the Arts, Montreal: Morin.

Corning, J. and Levy, A. (2002) ‘Demand for Live theatre with market segmentation and seasonality’, Journal of Cultural Economics, Vol. 26, No. 3, pp.217–235.

Crone, S., Lessmann, S. and Stahlblock, R. (2006) ‘The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing’, European Journal of Operational Research, Vol. 173, No. 3, pp.781–800.

Cuadrado, M. and Mollà, A. (2000) ‘Grouping performing arts consumers according to attendance goals’, International Journal of Arts Management, Vol. 2, No. 3, pp.54–60.

Cui, G., Wong, M. and Lui, H. (2006) ‘Machine learning for direct marketing response models: bayesian networks with evolutionary programming’, Management Science, Vol. 52, No. 4, pp.597–613.

Diggle, K. (1984) ‘The A.D.A.M model’, www.audience-development.net (accessed 19 August 2006).

DiMaggio, P.J., Seem, M. and Brown, P. (1978) Audience Studies of the Performing Arts and Museums: A Critical Review, Washington: National Endowment for the Arts.

Friedman, N., Geiger, D. and Goldszmidt, M. (1997) ‘Bayesian network classifiers’, Machine Learning, Vol. 29, pp.131–163.

Haley, R.J. (1968) ‘Benefit segmentation: a decision-oriented research toll’, Journal of Marketing, July, pp.30–35.

Hayes, D. and Slater, A. (2002) ‘Rethinking the missionary position – the quest for sustainable audience development strategies’, Managing Leisure, Vol. 7, pp.1–17.

Hill, E., O’Sullivan, C. and O’Sullivan, T. (1997) Creative Arts Marketing, Butterworth-Heinemann.

John, G.H., Kohavi, R. and Pfleger, K. (1994) ‘Irrelevant features and the subset selection problem’, Proceedings of the Eleventh International Conference on Machine Learning, San Francisco, CA: Morgan Kaufmann Publishers, pp.121–129.

Kohavi, R. (1995) ‘A study of cross-validation and bootstrap for accuracy estimation and model selection’, Proceedings of the 15th International Joint Conference on Artificial Intelligence, pp.1137–1145.

Kolb, B.M. (1997) ‘Pricing as the key to attracting students to the performing arts’, Journal of Cultural Economics, Vol. 21, No. 2, pp.139–146.


Kotler, P. (1972) Marketing Management: Analysis, Planning and Control, 2nd ed., New Jersey: Prentice Hall.

Kotler, P. and Scheff, J. (1997) Standing Room Only: Strategies for Marketing the Performing Arts, Boston: Harvard Business School Press.

McCarthy, K.F. and Jinnett, K. (2001) A New Framework for Building Participation in the Arts, RAND Corporation, Santa Monica, California.

Quinlan, J.R. (1994) ‘Improved use of continuous attributes in C4.5’, Journal of Artificial Intelligence Research, Vol. 4, pp.77–90.

Semenik, R.J. and Young, C.E. (1997) ‘Market segmentation in arts organizations’, in N. Beckwith et al. (Eds.) Proceedings of the American Marketing Association Educators’ Conference, pp.474–478.

Someren, M.V. and Urbancic, T. (2005) ‘Applications of machine learning: matching problems to tasks and methods’, The Knowledge Engineering Review, Vol. 20, pp.363–402.

Sun, B. (2006) ‘Technology innovation and implications for customer relationship management’, Marketing Science, Vol. 25, No. 6, pp.594–599.

Tajtáková, M. and Arias-Aranda, D. (2008) ‘Targeting university students in audience development strategies for opera and ballet’, The Service Industries Journal, Vol. 28, No. 2, pp.179–191.

Tajtáková, M., Klepochová, D. and Žák, Š. (2005) ‘The attitudes of students towards opera and ballet: attendance, motivations, barriers and expectations’, The 8th International Conference on Arts & Cultural Management (AIMAC), Montréal, Canada, 3–6 July, p.75.

Turrini, A. (2002) ‘Audience attendance in performing arts as a Markov process’, 12th ACEI (Association for Cultural Economics International) Conference, Rotterdam, The Netherlands.

Wiggins, J. (2004) ‘Motivation, ability and opportunity to participate: a reconceptualization of the RAND model of audience development’, International Journal of Arts Management, Vol. 7, No. 1, pp.22–33.

Yankelovich, D. (1964) ‘New criteria for market segmentation’, Harvard Business Review, March-April, pp.83–90.

Bibliography

Blum, A.L. and Langley, P. (1997) ‘Selection of relevant features and examples in Machine Learning’, Artificial Intelligence, Vol. 97, pp.245–271.

Fisher, T.C.G. and Preece, S.B. (2002) ‘Evaluating performing arts audience overlap’, International Journal of Arts Management, Vol. 4, No. 3, pp.20–32.

Ripley, B.D. (1997) Pattern Recognition and Neural Networks, Cambridge University Press.

Sorjonen, H. (2002) ‘Market orientation in the context of performing arts organizations’, 12th ACEI (Association for Cultural Economics International) Conference, Rotterdam, The Netherlands.

Stokmans, M. (2005) ‘MAO-model of audience development: some theoretical elaborations and practical consequences’, 8th International Conference of Arts and Cultural Management (A.I.M.A.C.), Montréal, Canada.

Tajtáková, M. (2004) Report on Audience Survey at Opera and Ballet of the Slovak National Theatre, Bratislava: Slovak National Theatre.

Tajtáková, M. (2007) Stratégie rozvíjania publika v interpretačných umeniach. (trans. Audience Development Strategies in the Performing Arts), Bratislava: EKONÓM, ISBN: 978-80-225-2394-3.

Turrini, A. (2006) ‘Measuring audience addiction to the arts: the case of an Italian Theatre’, International Journal of Arts Management, Vol. 8, No. 3, pp.43–53.

Machine learning methods for the market segmentation of the performing arts audiences

Documents

Transcript of Machine learning methods for the market segmentation of the performing arts audiences