Download - Learning Classifier Systems and Behavioural Animation of Virtual Characters

Learning Classifier Systems and behavioural animation

of virtual characters

Stéphane Sanchez, Hervé Luga, Yves Duthen

Université Toulouse 1/IRIT, Allées de Brienne, 31042 Toulouse cedex, France

{sanchez, luga, duthen}@irit.fr

Abstract. Producing intuitive systems for the directing of virtual actors is one

of the major objectives of research in virtual animation. So, it is often

interesting to conceive systems that enable behavioral animation of autonomous

characters, able to correctly fulfill directives from a human user considering

their goal and their perception of the virtual environment. Common ways to

generate behaviors of such virtual characters use usually determinist algorithm

(scripts or automatons). Thus the autonomy of the characters is a fixed routine

that cannot adapt to novelty or any situation not previously considered. To

make these virtual actors able of adaptation, we propose to combine a

behavioral framework (ViBes) and an evolutionist learning system, the

Learning Classifier Systems.

Introduction

The production of intuitive systems for the direction of synthetic actors is a major

objective of virtual story-telling research. Actually, most of the existing systems use

detailed scenarios based upon scripts [16, 21] or automatons [5] that are associated to

reactive systems to animate the virtual characters. This association allows an accurate

control of characters moves and of their behaviors. The results of such an approach

are generally convincing for the virtual characters are usually perceived as intelligent

and autonomous: they are able to plan a more or less complex sequence of actions

according to the requests of the human director and to the virtual world they are

situated in. However, due to the use of scripts or of automatons, computed behaviors

are mainly deterministic and, in certain cases, they may not be able to correctly fulfill

the director’s expectations. Indeed, if a non expected situation occurs, then a

deterministic behavior cannot propose a correct action according to director’s order.

The virtual actors cannot adapt to unexpected situation or novelty.

To correct this default, our intention is to allow virtual characters to learn how to

solve new tasks from their own knowledge and their perception of their environment.

This approach to create autonomous virtual characters (that can operate in various

virtual reality applications) is interesting. Indeed, while virtual humans can have

similar simple ways to interact with their environment (human-like actions such as

“walk”, “grab an object”, “push something” …), it rapidly becomes necessary to

combine these elementary actions to compute the numerous complex behaviors

required by a virtual reality application. The manual conception of such behaviors

becomes quickly complex for the programmer, whatever modeling technique he uses,

must consider from start every situations that the virtual character may encounter

within the virtual environment. In the case this environment is both complex and

dynamic, considering every situation can be extremely difficult. Using learning

systems, especially if these systems are both adaptive and dynamic as classifiers

systems are, can lighten cognitive work of programmer for he will have to focus only

on the relevant data of the task to solve and the key situations that can produce a

learning reward. The planning of several actions (or the mechanism to solve a specific

task) will be deduced by the virtual character itself according to its perception of its

environment and the received rewards.

In this paper, we will present first the evolutionist system that we use as leaning

method, the classifiers systems. Then, after we underlined their interest for behavioral

animation, we will show how we associate them to a simulation system in order to

dynamically generate behaviors of human shaped virtual characters.

Learning Classifier Systems

Learning Classifier Systems (LCS) are evolutionist systems that use a Genetic

Algorithm (AG) over a population of decision/action kind of production rules, the

classifiers, in order to identify and to reinforce a specific sub-population of rules that

are able to cooperate to fulfill a specific task. John Holland [7] presented these

systems as a framework that uses Genetic Algorithm to study learning in systems

based upon condition/action kind of rules. The bases of its work are two main ideas.

The first one is that the Darwinian theory about the survival of the fittest individual

can be used to condition the adaptation of an artificial system to an unknown

environment. The second one is that an agent can learn to realize a task from its

interactions with a partially known environment and from the maximization of the

rewards it receives according to its actions.

A classifier is a formalized and generalized representation of an “IF condition

THEN action” rule. It consists of a condition part, an action part and a, associated

scalar value, its strength, that indicates its performance, or its utility, in the resolution

of the task submitted to the classifiers system (figure 1). The performance of a

classifier is usually considered as a prediction of the reward to come from the

execution of its action part in the environment.

Figure 1: a classifier

The condition of a classifier represents the current state of the environment. It is a

fixed length ternary string from {0,1,#}. The action part and inputs from the

environment are fixed length binary strings1.

1 The use of different alphabets to encode inputs and actions allows generalization of

production rules. Indeed, ‘#’ is a wildcard and matches either a ‘1’ or a ‘0’ in the

There are many variants of the original LCS proposed by Holland in 1975. In this

project, we chose to focus on a simplified version of LCS, called XCS (eXtended

Classifier System) presented by Wilson in 1995 [15]. This system, on the contrary of

original LCS or of their main evolution (ZCS, Zeroth level Classifier system [14]), is

an accuracy based system instead of a strength based one. Thus these systems avoid

the main problem of original LCS and ZCS that is the dominance of strong

overgeneral rules (a rule with more # in its condition and that has greater strength than

its more specific counterpart but that does not perform as correctly) [22]. This

dominance forbid the system to explore potentially better specific rules, and the

presence of totally generalized rules (###::01 for example) destabilizes the decision

process for they are systematically activated.

Environnement

sensors effectors

[P]

10011

AG

Reward

matching

selection

10

[M]

10011

covering

[A]

...

ClassifiersP E F

#0101::01 23 .01 99#01#1::10 45 .13 101##11::11 12 .05 52#0###::01 10 .24 1210011::10 33 .02 9811#00::01 15 .17 15001#1::11 29 .02 9210#11::10 30 .17 15

1##11::11 12 .05 52#0###::01 10 .24 1210#11::10 30 .17 15

10011::10 33 .02 9810#11::10 30 .17 1510011::10 33 .02 98

Figure 2: Schematic illustration of XCS

Learning Classifiers Systems operate in cycles (figure 2):

1. LCS receives an input from the environment through the sensors.

2. LCS finds rules which match input (called the match set [M]). A covering operator

may be used. This operator, generates on-the-fly, new rules from an input that no

existing classifier matches.

3. The matching rules may not advocate the same action: conflict resolution occurs

and an action is chosen considering the utility of involved classifiers. Usually, the

action of the strongest classifier is chosen. The tradeoff between doing what is

currently best (exploitation) and trying something new which might be better

(exploration) is realized using selection method from AG such as roulette-wheel or

tournament selection (both methods favor strongest classifiers but can chose an action

input. E.g. input 010 is matched by these conditions: ###, ##0, 0##, #1#, #10, 0#0,

01#, 010.

from a less performing classifier). The rules advocating the chosen action are called

the action set [A].

4. LCS performs the chosen action and receives feedback (a reward form the

environment) on it.

5. The rule utility estimates are updated based on feedback, typically using Bucket

Brigade or Q-Learning algorithms. This ensures reinforcement learning of the system.

6. The rule discovery system (Genetic Algorithm) may be called. Its role function is

to generate new (hopefully useful) rules and deletes less useful ones (the utility

estimates are usually used as selection criterion).

Learning Classifier Systems and behavioral animation

In the context of behavioral animation of virtual characters, LCS are first

interesting for they fulfill most of the constraints related to the conception of decision

systems of cognitive agents [12]:

1. The population of classifiers represents the knowledge of the agent. Rules are of

condition/action kind and are encoded using a minimal alphabet. So, they can express

in a symbolic way the various situations of the environment and they can link to each

of these situations a more or less relevant action (the relevance of an action being its

utility estimate).

2. LCS main loop is classically a perception-decision-action loop. A rule advocate an

action if it matches the input from the environment. The action is then executed

through effectors (see figure 2).

3. XCS (and ZCS) are reactive systems: they both produce an action for each input

processed.

4. The association of selection mechanisms based upon rules utility according to

environmental situation and of reinforcement algorithms that update this utility using

retropropagation of rewards from the environment, allows both learning and

persistence of most effective behaviors. Besides, selection algorithms being mainly

stochastic, the system favors best action selection without necessarily avoiding trying

new ways to fulfill its goal or task.

5. The combined actions of an operator that generates rules on-the-fly according to

the situation (covering) and of a genetic algorithm that generates potentially better

rules, ensures adaptability of the decision system.

6. LCS are able to simulate purely reactive behaviors as well as to plan long sequence

of actions.

Secondly, LCS can be effective to solve tasks and to plan sequences in complex

environments as the ones our virtual characters are situated in. Booker [11] defined

such environments as noisy and dynamics (they continuously generate new situations

that can be partially or incompletely perceived). They also require a continuous

production of new actions, often in real time. The goals of the agents are either

implicit or partially defined, and the relations between the different situations that

lead to their fulfillment are both complex and discontinued. Last, environmental

feedback is not systematic and can occur only after long sequences of actions (which

is totally compatible with credit assignment scheme of LCS, and in particular Bucket

Brigade algorithm).

Last, a LCS is a compact and complete behavioral system for it consists of sensors, a

set of decision rules and specific effectors. Thus, it is easy to integrate a LCS, as a

specific behavioral module that can solve a task, in a framework dedicated to

behavioral animation of virtual characters.

Classifier systems have been applied in numerous domains involving autonomous

agents such as robotics [2, 3, 19], animats [15, 9] and economic simulations [13, 20].

But they are little used in the context of autonomous virtual characters. Among the

few exceptions, we found the work of Sanza [18] about realistic simulation of soccer

where virtual players are controlled by individual classifier systems, and the work of

Heguy [6] about the simulation of basket players and manager. These two works,

while focused on the generation of collective behaviors, have shown the viability of

classifier systems to simulate autonomous vitual entities.

Learning of individual behaviors

In this paper, we propose to use a classifier system to generate automatically high

level behaviors of a human virtual actor that evolves in a realistic environment. In this

perspective, we chose to first implement a behavioral framework, ViBes2, that gives

our actors their perceptive abilities and their elementary interactive behaviors (walk,

grab, push, open, etc.). Then we introduce in this behavioral framework classifier

systems as a learning component to make them learn to use their current knowledge

and abilities in order to fulfill more “high level” tasks such as “eat something”.

ViBes framework

ViBes (figure 3) is an individual cognitive framework that enables a situated agent

(a virtual actor) to chose, according to its goals and its perception, which elementary

actions (“walk”, “grab”, etc.) it must perform in order to fulfill a more or less order

given by a human user. This framework is a necessary simulation environment to the

introduction of classifier systems as learning tools.

ViBes framework consists of four main parts. The perception system regroups a set of

virtual sensors that reproduce human senses (mainly sight and partially hearing), and

of low level cognitive sensors such as proximity sensors. The knowledge manager

stores all what the virtual actor know about its environment. It also maintains a

register of all its interactive abilities and it monitors internals parameters of the virtual

character. The instructions sequencer ensures communication between any virtual

reality application connected to ViBes and ViBes itself. Its main function is to

coherently manage the orders from the application and the successive activations of

the behaviors stored in the decision system. The decision system is a hierarchical

network of behavioral modules. Its function is to generate a behavior selecting,

according to perception and current goal, an action that will activate an elementary

interaction in the virtual environment or another behavioral module. Each behavioral

2 ViBes : Virtual Behaviors

module can fulfill a unique and specific task and it can be programmed using any kind

of decision making method (scripts, automatons, LCS, etc.).

Virtual environment User

Perception systemPerception systemPerception systemPerception systemKnowledge Knowledge Knowledge Knowledge

managermanagermanagermanager

Instructions Instructions Instructions Instructions

SequencerSequencerSequencerSequencer

Decision SystemDecision SystemDecision SystemDecision System

Action

Figure 3: ViBes Framework

The completion strategy, inspired from Minsky's work [20], and in accordance with

the bottom-up architecture of the Decision making process, is similar to the one used

in dynamic story-telling systems, such as Hierarchical Task Networks [23]: the task to

complete is decomposed in a set of subtasks (usually these tasks are said to have a

lower level of abstraction, the lesser one being the simple reactive action as "make a

step" for example) that must be fulfilled until the completion of the main one.

Figure 4: Subdivision of "GoTo" task

Figure 5: Processing diagram of "GoTo" behavioral module

The figure 4 shows one of the hierarchical decomposition for the "GoTo" module.

This module is composed of three sub-modules: one to find the path to the

destination, another to make the actor move between the navigation points of the

computed path and the last to make him remove any unavoidable obstacle. The figure

5 shows how these modules are combined together to make the actor go to the

requested destination using the actions he knows ("walk", "avoid obstacle", "push",

"pull" and "open").

Introducing Classifier Systems in ViBes

The main idea behind introducing Classifiers System in Vibes is that is possible to

let the virtual actor learning how to coherently select and combine the different

behavioral modules at its disposal in order to generate a plan of actions that fulfill a

new allocated task.

The learning system must be able to choose among the existing behaviors the ones

that can fulfill the task and it must also plan a correct and minimal sequence of orders

to achieve its goal. This is ensured by the combined effects of the LCS selection

strategy, the application of Q-Learning principle and the rewarding scheme. Given a

configuration of the virtual world, the LCS selects the best suitable production rule

(i.e. the classifier with the highest utility estimate) to generate the next action to

execute. The Q-Learning principle allows the evaluation of this action consequences

according to the task to complete and the rules of the virtual world. A correct action

implies a positive mark and, on the contrary, an incorrect one produces a penalty. This

reward, or this penalty, is added to the strength of the classifier that has produced the

action. As the LCS proceeds through its learning phase, the selection rate of good

actions increases (because of the reinforcement of the utility of the production rules)

while the incorrect ones are more and more ignored. Finally, the rewarding system

grants a bonus reward according to the number of steps required to achieve the

intended goal. Granting a higher bonus to the shortest plans of actions allows the LCS

to converge towards a minimal sequence that avoids inappropriate or useless sub-

courses.

Figure 6: Introducing LCS as a behavioral module in ViBes.

Classifier Systems are simply introduced in ViBes as behavioral modules (figure 6)

that must evolve to fulfill a new specific task. In this behavioral module, a Classifier

System is introduced as following:

Environmental situations are processed to the perceptive filters. Their function is to

transform perceived data into a vector of relevant values that will form the inputs of

the classifier system according to the task to learn. A generalization of these values

will then form the condition part of the classifier system.

The population of production rules (a.k.a. classifiers) is stored in the solver part of the

module. The solver will perform the action selection procedure and all the generation

of new rules and reward distribution mechanisms. On the contrary of classical

classifier systems, the advocated action is not directly processed to the virtual

character effectors. First the system evaluates its validity: an action is valid if all its

preconditions are fulfilled (a virtual actor can’t walk if it is sitting). Then the action is

processed to the instructions sequencer to be executed in the virtual environment.

The estimation of reward to allocate is done by the behavioral manager that manages

the process of the task fulfillment.

Last, introducing LCS as a behavioral module in ViBes, enables to use the

hierarchical structure of the framework in order to activate in cascade several

classifier systems. Thus, it is possible, to fulfill a complex task, to conceive a

hierarchical set of LCS of lesser complexity. This is interesting for it is a way to avoid

one of the main problems of LCS: the difficulty to form really long sequences of

actions.

Results

The first step in the implementation of ViBes has been to produce a set of 40

elementary behavioral modules in order to validate the architecture. These modules

are of three kinds: motion ones (path-planning, path-following and collision

avoidance), selection ones (choice of a virtual element according to its features) and

interactive ones (grab, push, throw, etc.). ViBes was then integrated into the V-Man

industrial project and this framework has enabled the animation under human

directions of a virtual actor in a virtual world filled with interactive objects. The

second part of the project was the validation of LCS as behaviors producing tools. In

this part, we proposed to make virtual actors living in a virtual house learn everyday

tasks. Their goals are to satisfy elementary needs as “feed yourself”, “divert yourself”,

or “rest youself”.

In this paper, we will focus on the “feed yourself” problem. In this case, we chose to

implement two LCS. The first one is the “EatSomething” behavioral module (its goal

is to let the actor, find an aliment, cook it if necessary and eat it). And the second one

is the “Cookfood” module (its goal is to make the actor cook its food). The first

module can eventually trigger the activation of the second one. In this part, we will

only show implantation and results of “CookFood” LCS.

The LCS we used is not a classical XCS but a new extension, called GHXCS3. This

LCS has the particularity to replace usual binary and ternary strings to encode inputs,

conditions and actions, with heterogeneous vector of typed genes (bit, trit, integer,

3 Generic Heterogeneous Classifier Systems [17]

real, list, etc.). This simplifies the implantation of new learning problems for it is no

more necessary to translate relevant perceived data into binary encoding.

This necessary step in the implantation of problems with LCS is shown in figure 7:

from relevant observed data from the virtual environment, we create a situation vector

(input). The generalization of each term of this vector creates the condition part of a

referring classifier. The action part is an integer that represents an index that indicates

the instruction to advocate4. This instruction can either be a trigger of another

behavioral module or an elementary interaction to perform in the virtual environment.

Eventually, a parameter that is specific to a particular indexed instruction can be

added in the action part. The implantation of the problem is finalized by the

determination of stopping condition of the problem. In the case of “CookFood”

problem, the task is fulfilled when the food is cooked and the cooking device is off.

The reward is given to the learning system at this point.

Figure 7: Implantation of « CookFood » problem.

Once the system is initialized using the referring classifier, the virtual actor is set in

learning situation (inside the house with the objective to feed itself). Once the

learning is considered as done (the actor fulfill its task whatever is the starting

situation), we can analyze the population of classifiers and extract the best rules in

order to represents the new behavior as an automaton. The figure 8 shows the

obtained result in the case of “CookFood” problem.

4 For clarity of the presentation, in figure 7, we have limited the indexation of

instruction to the relevant ones according to « CookFood » problem. In reality, this

indexation considers all the behaviors the actor can perform.

Figure 8: « CookFood » problem automaton.

This automaton shows several interesting facts. First, the virtual actor has learned to

accomplish the requested task and the generated behavior is both realistic and

coherent considering our expectations: if the actor chooses to cook its food in the

oven then it puts the food in the oven (taking care of previously open it if necessary),

close it to switch it on, and wait till the cooking is done. At the end, the actor switches

off the oven. Second, we can see that the actor has learned two alternative plans: one

to use the stove and one to use the oven. Last, for each one of the plans, the learned

strategy is always the shortest one according to the actor capacities and knowledge.

Besides, the actor avoids any non-relevant actions considering the task to solve.

The two LCS (“EatSomething” and “CookFood”) have been introduced in the V-Man

application and the following character animation was produced (figure 9).

Figure 9: 1- the virtual character on the left is hungry and it is going towards the

kitchen. 2-It handles the closed door of the kitchen. 3- The character grabs a raw apple. 4-

It decides to cook it in the oven. 5- It waits the end of the cooking. 6- The character eats

the cooked apple. 7- The character trashes the remains of the apple.

Conclusion

When designing autonomous virtual characters, one of the major difficulties is the

establishment of the behaviors of this entity. Indeed, in a realistic environment, both

complex and dynamic, to consider every situation which a virtual character may

encounter while trying to fulfill a specific task is a difficult and constraining work

(but necessary if we intent to completely describe a behavior using scripts or

automatons). The results that we obtained during the integration of Classifiers

Systems in a functional behavioral architecture showed us that this method of

evolutionary training made it possible to generate new coherent individual behaviors

from existing ones. Moreover, it proved that these systems made it possible to

quickly extend the functionalities of the synthetic actors of the V-Man application at

lower cost lower costs (total duration of implantation and training of a new problem

by a confirmed user being of approximately an hour and half). However, one should

not lose sight of the fact that Classifier Systems are evolutionary systems and,

consequently, convergence towards a suitable base of rules can be difficult to obtain

(it may be difficult for an non-expert user to tune adequately the different parameters

and subsystems involved in LCS convergence). Lastly, within the framework of the

V-Man application, we restricted the use of LCS to the learning of non trivial

individual tasks but still relatively simple. It seems necessary to extend our study to

more complex tasks, and to consider the use LCS for the learning of collective

behaviors (simulation of missions, team games, social systems, etc).

References

1. Booker L.B., Goldberg D.E., and Holland J.H. “Classifier Systems and Genetic

Algorithms”. Artificial Intelligence, 40 :235-282, 1989.

2. Bonasso P.R., Firby J.R., Gat E., Kortenkamp D., Miller D.P. et Slack M.G. “Experiences

with an architecture for intelligent, reactive agents”. Journal of Experimental and

Theoretical Artificial Intelligence, Vol 9(1), pp 237-256, 1997.

3. Butz M.V., Goldberg D.E., and Stolzmann W. “New challenges for an ACS : Hard

problems and possible solutions” in Technical Report, University of Illinois at

Urbana-Champaign, Number 99019, October 1999.

4. Charles, F., Cavazza, M., and Mead, S.J.: Generating Dynamic Storylines Through

Characters’ Interactions. International Journal on Intelligent Games and

Simulation, vol. 1, no. 1, p. 5- 11,March, 2002.

5. Donikian S. “HPTS : a behaviour modelling language for autonomous agents” in

Proceedings of the Fifth International Conference on Autonomous Agents, pp. 401-408,

ACM Press, May 2001.

6. Heguy O. “Architecture comportementale pour l’émergence d’activités coopératives en

environnement virtuel”. Thèse de Doctorat, Université Paul Sabatier, Toulouse

(France), Décembre 2003.

7. Holland J. H. “Adaptation” in R. Rosen and F. M. Snell, editors, Progress in Theoretical

Biology. New York : Plenum, 1976.

8. Thalmann D. and Kallmann M. “Direct 3D Interaction with Smart Objects”, November

1999.

9. Lattaud C. “Non-homogenous classifier systems in a macro-evolution process” in 2nd

International Workshop on Learning Classifier Systems, pp. 266-271, 13 July 1999.

10. Minsky M. “The society of mind”. Simon and Schuster Inc., New York, NY. 1985

11. Menou E., Philippon L., Sanchez S., Duchon J., Balet O., "The V-Man Project: towards

autonomous virtual characters", Second International Conference on Virtual

Storytelling, Published in Lecture Notes in Computer Science, Springer, Vol. 2897,

2003.

12. Newell A. and Simon H.A. “Computer Science as Empirical Enquiry”, Communications of

the ACM, Vol. 19, pp. 113-126, 1976.

13. Olivier V., "Economie et action : un modèle évolutionniste d'apprentissage technologique",

Université des Sciences Sociales, Juin 1996.

14. Wilson S.W. “ZC : A zeroth level classifier system”. Evolutionary Computation, 2(1): 1-18,

1994.

15. Wilson S.W. “Classifier Fitness Based on Accuracy”. Evolutionary Computation, 3(2): 149-

175, 1995.

16. Perlin K., Goldberg A. “Improv: a system for scripting interactive actors in virtual worlds”,

proceedings of SIGGRAPH’96, 1996, New Orleans, 205-216

17. Sanchez S., « Mécanismes évolutionnistes pour la simulation comportementale d’humains

virtuels », Thèse de Doctorat, Université Toutlouse 1 Sciences Sociales (Toulouse),

December 2004.

18. Sanza C., « Evolution d'entités virtuelles coopératives par systèmes de classifieurs », Thèse

de Doctorat, Université Paul Sabatier (Toulouse), june 2001.

19. Stolzmann W. “Latent learning in Khepera robots with Anticipatory Classifier Systems” in

International Workshop on Learning Classifier Systems (2.IWLCS) on the Genetic

and Evolutionary Computation Conference (GECCO-99), pp 290-297, Orlando, Florida,

1999.

20. Schulenburg S., Ross P. "Strength and Money: An LCS Approach to Increasing Returns",

IWLCS’2000, Third International Workshop on Learning Classifier Systems, Paris,

Septembre 2000.

21. Thalmann D., Musse S.R. and Kallmann M “Virtual Humans’ Behavior : Individuals,

Group and Crowds “ Proceedings of Digital Media Futures International Conference.

1999. Bradford (United Kingdom).

22. Kovacs T. “Strength or Accuracy: Credit assignment” in Learning Classifier Systems,

Springer, 2004.