Learning Classifier Systems and behavioural animation
of virtual characters
Stéphane Sanchez, Hervé Luga, Yves Duthen
Université Toulouse 1/IRIT, Allées de Brienne, 31042 Toulouse cedex, France
{sanchez, luga, duthen}@irit.fr
Abstract. Producing intuitive systems for the directing of virtual actors is one
of the major objectives of research in virtual animation. So, it is often
interesting to conceive systems that enable behavioral animation of autonomous
characters, able to correctly fulfill directives from a human user considering
their goal and their perception of the virtual environment. Common ways to
generate behaviors of such virtual characters use usually determinist algorithm
(scripts or automatons). Thus the autonomy of the characters is a fixed routine
that cannot adapt to novelty or any situation not previously considered. To
make these virtual actors able of adaptation, we propose to combine a
behavioral framework (ViBes) and an evolutionist learning system, the
Learning Classifier Systems.
Introduction
The production of intuitive systems for the direction of synthetic actors is a major
objective of virtual story-telling research. Actually, most of the existing systems use
detailed scenarios based upon scripts [16, 21] or automatons [5] that are associated to
reactive systems to animate the virtual characters. This association allows an accurate
control of characters moves and of their behaviors. The results of such an approach
are generally convincing for the virtual characters are usually perceived as intelligent
and autonomous: they are able to plan a more or less complex sequence of actions
according to the requests of the human director and to the virtual world they are
situated in. However, due to the use of scripts or of automatons, computed behaviors
are mainly deterministic and, in certain cases, they may not be able to correctly fulfill
the director’s expectations. Indeed, if a non expected situation occurs, then a
deterministic behavior cannot propose a correct action according to director’s order.
The virtual actors cannot adapt to unexpected situation or novelty.
To correct this default, our intention is to allow virtual characters to learn how to
solve new tasks from their own knowledge and their perception of their environment.
This approach to create autonomous virtual characters (that can operate in various
virtual reality applications) is interesting. Indeed, while virtual humans can have
similar simple ways to interact with their environment (human-like actions such as
“walk”, “grab an object”, “push something” …), it rapidly becomes necessary to
combine these elementary actions to compute the numerous complex behaviors
required by a virtual reality application. The manual conception of such behaviors
becomes quickly complex for the programmer, whatever modeling technique he uses,
must consider from start every situations that the virtual character may encounter
within the virtual environment. In the case this environment is both complex and
dynamic, considering every situation can be extremely difficult. Using learning
systems, especially if these systems are both adaptive and dynamic as classifiers
systems are, can lighten cognitive work of programmer for he will have to focus only
on the relevant data of the task to solve and the key situations that can produce a
learning reward. The planning of several actions (or the mechanism to solve a specific
task) will be deduced by the virtual character itself according to its perception of its
environment and the received rewards.
In this paper, we will present first the evolutionist system that we use as leaning
method, the classifiers systems. Then, after we underlined their interest for behavioral
animation, we will show how we associate them to a simulation system in order to
dynamically generate behaviors of human shaped virtual characters.
Learning Classifier Systems
Learning Classifier Systems (LCS) are evolutionist systems that use a Genetic
Algorithm (AG) over a population of decision/action kind of production rules, the
classifiers, in order to identify and to reinforce a specific sub-population of rules that
are able to cooperate to fulfill a specific task. John Holland [7] presented these
systems as a framework that uses Genetic Algorithm to study learning in systems
based upon condition/action kind of rules. The bases of its work are two main ideas.
The first one is that the Darwinian theory about the survival of the fittest individual
can be used to condition the adaptation of an artificial system to an unknown
environment. The second one is that an agent can learn to realize a task from its
interactions with a partially known environment and from the maximization of the
rewards it receives according to its actions.
A classifier is a formalized and generalized representation of an “IF condition
THEN action” rule. It consists of a condition part, an action part and a, associated
scalar value, its strength, that indicates its performance, or its utility, in the resolution
of the task submitted to the classifiers system (figure 1). The performance of a
classifier is usually considered as a prediction of the reward to come from the
execution of its action part in the environment.
Figure 1: a classifier
The condition of a classifier represents the current state of the environment. It is a
fixed length ternary string from {0,1,#}. The action part and inputs from the
environment are fixed length binary strings1.
1 The use of different alphabets to encode inputs and actions allows generalization of
production rules. Indeed, ‘#’ is a wildcard and matches either a ‘1’ or a ‘0’ in the
There are many variants of the original LCS proposed by Holland in 1975. In this
project, we chose to focus on a simplified version of LCS, called XCS (eXtended
Classifier System) presented by Wilson in 1995 [15]. This system, on the contrary of
original LCS or of their main evolution (ZCS, Zeroth level Classifier system [14]), is
an accuracy based system instead of a strength based one. Thus these systems avoid
the main problem of original LCS and ZCS that is the dominance of strong
overgeneral rules (a rule with more # in its condition and that has greater strength than
its more specific counterpart but that does not perform as correctly) [22]. This
dominance forbid the system to explore potentially better specific rules, and the
presence of totally generalized rules (###::01 for example) destabilizes the decision
process for they are systematically activated.
Environnement
sensors effectors
[P]
10011
AG
Reward
matching
selection
10
[M]
10011
covering
[A]
...
ClassifiersP E F
#0101::01 23 .01 99#01#1::10 45 .13 101##11::11 12 .05 52#0###::01 10 .24 1210011::10 33 .02 9811#00::01 15 .17 15001#1::11 29 .02 9210#11::10 30 .17 15
1##11::11 12 .05 52#0###::01 10 .24 1210#11::10 30 .17 15
10011::10 33 .02 9810#11::10 30 .17 1510011::10 33 .02 98
Figure 2: Schematic illustration of XCS
Learning Classifiers Systems operate in cycles (figure 2):
1. LCS receives an input from the environment through the sensors.
2. LCS finds rules which match input (called the match set [M]). A covering operator
may be used. This operator, generates on-the-fly, new rules from an input that no
existing classifier matches.
3. The matching rules may not advocate the same action: conflict resolution occurs
and an action is chosen considering the utility of involved classifiers. Usually, the
action of the strongest classifier is chosen. The tradeoff between doing what is
currently best (exploitation) and trying something new which might be better
(exploration) is realized using selection method from AG such as roulette-wheel or
tournament selection (both methods favor strongest classifiers but can chose an action
input. E.g. input 010 is matched by these conditions: ###, ##0, 0##, #1#, #10, 0#0,
01#, 010.
from a less performing classifier). The rules advocating the chosen action are called
the action set [A].
4. LCS performs the chosen action and receives feedback (a reward form the
environment) on it.
5. The rule utility estimates are updated based on feedback, typically using Bucket
Brigade or Q-Learning algorithms. This ensures reinforcement learning of the system.
6. The rule discovery system (Genetic Algorithm) may be called. Its role function is
to generate new (hopefully useful) rules and deletes less useful ones (the utility
estimates are usually used as selection criterion).
Learning Classifier Systems and behavioral animation
In the context of behavioral animation of virtual characters, LCS are first
interesting for they fulfill most of the constraints related to the conception of decision
systems of cognitive agents [12]:
1. The population of classifiers represents the knowledge of the agent. Rules are of
condition/action kind and are encoded using a minimal alphabet. So, they can express
in a symbolic way the various situations of the environment and they can link to each
of these situations a more or less relevant action (the relevance of an action being its
utility estimate).
2. LCS main loop is classically a perception-decision-action loop. A rule advocate an
action if it matches the input from the environment. The action is then executed
through effectors (see figure 2).
3. XCS (and ZCS) are reactive systems: they both produce an action for each input
processed.
4. The association of selection mechanisms based upon rules utility according to
environmental situation and of reinforcement algorithms that update this utility using
retropropagation of rewards from the environment, allows both learning and
persistence of most effective behaviors. Besides, selection algorithms being mainly
stochastic, the system favors best action selection without necessarily avoiding trying
new ways to fulfill its goal or task.
5. The combined actions of an operator that generates rules on-the-fly according to
the situation (covering) and of a genetic algorithm that generates potentially better
rules, ensures adaptability of the decision system.
6. LCS are able to simulate purely reactive behaviors as well as to plan long sequence
of actions.
Secondly, LCS can be effective to solve tasks and to plan sequences in complex
environments as the ones our virtual characters are situated in. Booker [11] defined
such environments as noisy and dynamics (they continuously generate new situations
that can be partially or incompletely perceived). They also require a continuous
production of new actions, often in real time. The goals of the agents are either
implicit or partially defined, and the relations between the different situations that
lead to their fulfillment are both complex and discontinued. Last, environmental
feedback is not systematic and can occur only after long sequences of actions (which
is totally compatible with credit assignment scheme of LCS, and in particular Bucket
Brigade algorithm).
Last, a LCS is a compact and complete behavioral system for it consists of sensors, a
set of decision rules and specific effectors. Thus, it is easy to integrate a LCS, as a
specific behavioral module that can solve a task, in a framework dedicated to
behavioral animation of virtual characters.
Classifier systems have been applied in numerous domains involving autonomous
agents such as robotics [2, 3, 19], animats [15, 9] and economic simulations [13, 20].
But they are little used in the context of autonomous virtual characters. Among the
few exceptions, we found the work of Sanza [18] about realistic simulation of soccer
where virtual players are controlled by individual classifier systems, and the work of
Heguy [6] about the simulation of basket players and manager. These two works,
while focused on the generation of collective behaviors, have shown the viability of
classifier systems to simulate autonomous vitual entities.
Learning of individual behaviors
In this paper, we propose to use a classifier system to generate automatically high
level behaviors of a human virtual actor that evolves in a realistic environment. In this
perspective, we chose to first implement a behavioral framework, ViBes2, that gives
our actors their perceptive abilities and their elementary interactive behaviors (walk,
grab, push, open, etc.). Then we introduce in this behavioral framework classifier
systems as a learning component to make them learn to use their current knowledge
and abilities in order to fulfill more “high level” tasks such as “eat something”.
ViBes framework
ViBes (figure 3) is an individual cognitive framework that enables a situated agent
(a virtual actor) to chose, according to its goals and its perception, which elementary
actions (“walk”, “grab”, etc.) it must perform in order to fulfill a more or less order
given by a human user. This framework is a necessary simulation environment to the
introduction of classifier systems as learning tools.
ViBes framework consists of four main parts. The perception system regroups a set of
virtual sensors that reproduce human senses (mainly sight and partially hearing), and
of low level cognitive sensors such as proximity sensors. The knowledge manager
stores all what the virtual actor know about its environment. It also maintains a
register of all its interactive abilities and it monitors internals parameters of the virtual
character. The instructions sequencer ensures communication between any virtual
reality application connected to ViBes and ViBes itself. Its main function is to
coherently manage the orders from the application and the successive activations of
the behaviors stored in the decision system. The decision system is a hierarchical
network of behavioral modules. Its function is to generate a behavior selecting,
according to perception and current goal, an action that will activate an elementary
interaction in the virtual environment or another behavioral module. Each behavioral
2 ViBes : Virtual Behaviors
module can fulfill a unique and specific task and it can be programmed using any kind
of decision making method (scripts, automatons, LCS, etc.).
Virtual environment User
Perception systemPerception systemPerception systemPerception systemKnowledge Knowledge Knowledge Knowledge
managermanagermanagermanager
Instructions Instructions Instructions Instructions
SequencerSequencerSequencerSequencer
Decision SystemDecision SystemDecision SystemDecision System
Action
Figure 3: ViBes Framework
The completion strategy, inspired from Minsky's work [20], and in accordance with
the bottom-up architecture of the Decision making process, is similar to the one used
in dynamic story-telling systems, such as Hierarchical Task Networks [23]: the task to
complete is decomposed in a set of subtasks (usually these tasks are said to have a
lower level of abstraction, the lesser one being the simple reactive action as "make a
step" for example) that must be fulfilled until the completion of the main one.
Figure 4: Subdivision of "GoTo" task
Figure 5: Processing diagram of "GoTo" behavioral module
The figure 4 shows one of the hierarchical decomposition for the "GoTo" module.
This module is composed of three sub-modules: one to find the path to the
destination, another to make the actor move between the navigation points of the
computed path and the last to make him remove any unavoidable obstacle. The figure
5 shows how these modules are combined together to make the actor go to the
requested destination using the actions he knows ("walk", "avoid obstacle", "push",
"pull" and "open").
Introducing Classifier Systems in ViBes
The main idea behind introducing Classifiers System in Vibes is that is possible to
let the virtual actor learning how to coherently select and combine the different
behavioral modules at its disposal in order to generate a plan of actions that fulfill a
new allocated task.
The learning system must be able to choose among the existing behaviors the ones
that can fulfill the task and it must also plan a correct and minimal sequence of orders
to achieve its goal. This is ensured by the combined effects of the LCS selection
strategy, the application of Q-Learning principle and the rewarding scheme. Given a
configuration of the virtual world, the LCS selects the best suitable production rule
(i.e. the classifier with the highest utility estimate) to generate the next action to
execute. The Q-Learning principle allows the evaluation of this action consequences
according to the task to complete and the rules of the virtual world. A correct action
implies a positive mark and, on the contrary, an incorrect one produces a penalty. This
reward, or this penalty, is added to the strength of the classifier that has produced the
action. As the LCS proceeds through its learning phase, the selection rate of good
actions increases (because of the reinforcement of the utility of the production rules)
while the incorrect ones are more and more ignored. Finally, the rewarding system
grants a bonus reward according to the number of steps required to achieve the
intended goal. Granting a higher bonus to the shortest plans of actions allows the LCS
to converge towards a minimal sequence that avoids inappropriate or useless sub-
courses.
Figure 6: Introducing LCS as a behavioral module in ViBes.
Classifier Systems are simply introduced in ViBes as behavioral modules (figure 6)
that must evolve to fulfill a new specific task. In this behavioral module, a Classifier
System is introduced as following:
Environmental situations are processed to the perceptive filters. Their function is to
transform perceived data into a vector of relevant values that will form the inputs of
the classifier system according to the task to learn. A generalization of these values
will then form the condition part of the classifier system.
The population of production rules (a.k.a. classifiers) is stored in the solver part of the
module. The solver will perform the action selection procedure and all the generation
of new rules and reward distribution mechanisms. On the contrary of classical
classifier systems, the advocated action is not directly processed to the virtual
character effectors. First the system evaluates its validity: an action is valid if all its
preconditions are fulfilled (a virtual actor can’t walk if it is sitting). Then the action is
processed to the instructions sequencer to be executed in the virtual environment.
The estimation of reward to allocate is done by the behavioral manager that manages
the process of the task fulfillment.
Last, introducing LCS as a behavioral module in ViBes, enables to use the
hierarchical structure of the framework in order to activate in cascade several
classifier systems. Thus, it is possible, to fulfill a complex task, to conceive a
hierarchical set of LCS of lesser complexity. This is interesting for it is a way to avoid
one of the main problems of LCS: the difficulty to form really long sequences of
actions.
Results
The first step in the implementation of ViBes has been to produce a set of 40
elementary behavioral modules in order to validate the architecture. These modules
are of three kinds: motion ones (path-planning, path-following and collision
avoidance), selection ones (choice of a virtual element according to its features) and
interactive ones (grab, push, throw, etc.). ViBes was then integrated into the V-Man
industrial project and this framework has enabled the animation under human
directions of a virtual actor in a virtual world filled with interactive objects. The
second part of the project was the validation of LCS as behaviors producing tools. In
this part, we proposed to make virtual actors living in a virtual house learn everyday
tasks. Their goals are to satisfy elementary needs as “feed yourself”, “divert yourself”,
or “rest youself”.
In this paper, we will focus on the “feed yourself” problem. In this case, we chose to
implement two LCS. The first one is the “EatSomething” behavioral module (its goal
is to let the actor, find an aliment, cook it if necessary and eat it). And the second one
is the “Cookfood” module (its goal is to make the actor cook its food). The first
module can eventually trigger the activation of the second one. In this part, we will
only show implantation and results of “CookFood” LCS.
The LCS we used is not a classical XCS but a new extension, called GHXCS3. This
LCS has the particularity to replace usual binary and ternary strings to encode inputs,
conditions and actions, with heterogeneous vector of typed genes (bit, trit, integer,
3 Generic Heterogeneous Classifier Systems [17]
real, list, etc.). This simplifies the implantation of new learning problems for it is no
more necessary to translate relevant perceived data into binary encoding.
This necessary step in the implantation of problems with LCS is shown in figure 7:
from relevant observed data from the virtual environment, we create a situation vector
(input). The generalization of each term of this vector creates the condition part of a
referring classifier. The action part is an integer that represents an index that indicates
the instruction to advocate4. This instruction can either be a trigger of another
behavioral module or an elementary interaction to perform in the virtual environment.
Eventually, a parameter that is specific to a particular indexed instruction can be
added in the action part. The implantation of the problem is finalized by the
determination of stopping condition of the problem. In the case of “CookFood”
problem, the task is fulfilled when the food is cooked and the cooking device is off.
The reward is given to the learning system at this point.
Figure 7: Implantation of « CookFood » problem.
Once the system is initialized using the referring classifier, the virtual actor is set in
learning situation (inside the house with the objective to feed itself). Once the
learning is considered as done (the actor fulfill its task whatever is the starting
situation), we can analyze the population of classifiers and extract the best rules in
order to represents the new behavior as an automaton. The figure 8 shows the
obtained result in the case of “CookFood” problem.
4 For clarity of the presentation, in figure 7, we have limited the indexation of
instruction to the relevant ones according to « CookFood » problem. In reality, this
indexation considers all the behaviors the actor can perform.
Figure 8: « CookFood » problem automaton.
This automaton shows several interesting facts. First, the virtual actor has learned to
accomplish the requested task and the generated behavior is both realistic and
coherent considering our expectations: if the actor chooses to cook its food in the
oven then it puts the food in the oven (taking care of previously open it if necessary),
close it to switch it on, and wait till the cooking is done. At the end, the actor switches
off the oven. Second, we can see that the actor has learned two alternative plans: one
to use the stove and one to use the oven. Last, for each one of the plans, the learned
strategy is always the shortest one according to the actor capacities and knowledge.
Besides, the actor avoids any non-relevant actions considering the task to solve.
The two LCS (“EatSomething” and “CookFood”) have been introduced in the V-Man
application and the following character animation was produced (figure 9).
Figure 9: 1- the virtual character on the left is hungry and it is going towards the
kitchen. 2-It handles the closed door of the kitchen. 3- The character grabs a raw apple. 4-
It decides to cook it in the oven. 5- It waits the end of the cooking. 6- The character eats
the cooked apple. 7- The character trashes the remains of the apple.
Conclusion
When designing autonomous virtual characters, one of the major difficulties is the
establishment of the behaviors of this entity. Indeed, in a realistic environment, both
complex and dynamic, to consider every situation which a virtual character may
encounter while trying to fulfill a specific task is a difficult and constraining work
(but necessary if we intent to completely describe a behavior using scripts or
automatons). The results that we obtained during the integration of Classifiers
Systems in a functional behavioral architecture showed us that this method of
evolutionary training made it possible to generate new coherent individual behaviors
from existing ones. Moreover, it proved that these systems made it possible to
quickly extend the functionalities of the synthetic actors of the V-Man application at
lower cost lower costs (total duration of implantation and training of a new problem
by a confirmed user being of approximately an hour and half). However, one should
not lose sight of the fact that Classifier Systems are evolutionary systems and,
consequently, convergence towards a suitable base of rules can be difficult to obtain
(it may be difficult for an non-expert user to tune adequately the different parameters
and subsystems involved in LCS convergence). Lastly, within the framework of the
V-Man application, we restricted the use of LCS to the learning of non trivial
individual tasks but still relatively simple. It seems necessary to extend our study to
more complex tasks, and to consider the use LCS for the learning of collective
behaviors (simulation of missions, team games, social systems, etc).
References
1. Booker L.B., Goldberg D.E., and Holland J.H. “Classifier Systems and Genetic
Algorithms”. Artificial Intelligence, 40 :235-282, 1989.
2. Bonasso P.R., Firby J.R., Gat E., Kortenkamp D., Miller D.P. et Slack M.G. “Experiences
with an architecture for intelligent, reactive agents”. Journal of Experimental and
Theoretical Artificial Intelligence, Vol 9(1), pp 237-256, 1997.
3. Butz M.V., Goldberg D.E., and Stolzmann W. “New challenges for an ACS : Hard
problems and possible solutions” in Technical Report, University of Illinois at
Urbana-Champaign, Number 99019, October 1999.
4. Charles, F., Cavazza, M., and Mead, S.J.: Generating Dynamic Storylines Through
Characters’ Interactions. International Journal on Intelligent Games and
Simulation, vol. 1, no. 1, p. 5- 11,March, 2002.
5. Donikian S. “HPTS : a behaviour modelling language for autonomous agents” in
Proceedings of the Fifth International Conference on Autonomous Agents, pp. 401-408,
ACM Press, May 2001.
6. Heguy O. “Architecture comportementale pour l’émergence d’activités coopératives en
environnement virtuel”. Thèse de Doctorat, Université Paul Sabatier, Toulouse
(France), Décembre 2003.
7. Holland J. H. “Adaptation” in R. Rosen and F. M. Snell, editors, Progress in Theoretical
Biology. New York : Plenum, 1976.
8. Thalmann D. and Kallmann M. “Direct 3D Interaction with Smart Objects”, November
1999.
9. Lattaud C. “Non-homogenous classifier systems in a macro-evolution process” in 2nd
International Workshop on Learning Classifier Systems, pp. 266-271, 13 July 1999.
10. Minsky M. “The society of mind”. Simon and Schuster Inc., New York, NY. 1985
11. Menou E., Philippon L., Sanchez S., Duchon J., Balet O., "The V-Man Project: towards
autonomous virtual characters", Second International Conference on Virtual
Storytelling, Published in Lecture Notes in Computer Science, Springer, Vol. 2897,
2003.
12. Newell A. and Simon H.A. “Computer Science as Empirical Enquiry”, Communications of
the ACM, Vol. 19, pp. 113-126, 1976.
13. Olivier V., "Economie et action : un modèle évolutionniste d'apprentissage technologique",
Université des Sciences Sociales, Juin 1996.
14. Wilson S.W. “ZC : A zeroth level classifier system”. Evolutionary Computation, 2(1): 1-18,
1994.
15. Wilson S.W. “Classifier Fitness Based on Accuracy”. Evolutionary Computation, 3(2): 149-
175, 1995.
16. Perlin K., Goldberg A. “Improv: a system for scripting interactive actors in virtual worlds”,
proceedings of SIGGRAPH’96, 1996, New Orleans, 205-216
17. Sanchez S., « Mécanismes évolutionnistes pour la simulation comportementale d’humains
virtuels », Thèse de Doctorat, Université Toutlouse 1 Sciences Sociales (Toulouse),
December 2004.
18. Sanza C., « Evolution d'entités virtuelles coopératives par systèmes de classifieurs », Thèse
de Doctorat, Université Paul Sabatier (Toulouse), june 2001.
19. Stolzmann W. “Latent learning in Khepera robots with Anticipatory Classifier Systems” in
International Workshop on Learning Classifier Systems (2.IWLCS) on the Genetic
and Evolutionary Computation Conference (GECCO-99), pp 290-297, Orlando, Florida,
1999.
20. Schulenburg S., Ross P. "Strength and Money: An LCS Approach to Increasing Returns",
IWLCS’2000, Third International Workshop on Learning Classifier Systems, Paris,
Septembre 2000.
21. Thalmann D., Musse S.R. and Kallmann M “Virtual Humans’ Behavior : Individuals,
Group and Crowds “ Proceedings of Digital Media Futures International Conference.
1999. Bradford (United Kingdom).
22. Kovacs T. “Strength or Accuracy: Credit assignment” in Learning Classifier Systems,
Springer, 2004.
Top Related