The epidemic of innovation – playing around with an agent-based model
Transcript of The epidemic of innovation – playing around with an agent-based model
The Epidemic of Innovation – Playing Around with an Agent
Based Model
Pietro Terna
Dipartimento di Scienze economiche e finanziarie, Università di Torino, Italia [email protected]
Abstract. The artificial units of an agent based model can play around to
diffuse innovation and new ideas, or act to conserve the status quo, escaping
from advances in technology or organizational methods, or new ideas and
proposals, exactly as the agents in an epidemic situation can act to diffuse or
to avoid the contagion. The emerging structure is obviously a function of the
density of the agents, but its behavior can vary in a dramatic way if a few
agents are able to evolve some form of intelligent behavior. In our case
intelligent behavior is developed allowing the agents to plan actions using
artificial neural networks or, as an alternative, reinforcement learning
techniques.
The proposed structure of the neural networks is self developed via a trial
and errors process: the cross target method develops the structure of the
neural function correcting both the guesses about the actions to be done and
those about the related consequences. The reinforcement learning model is
built upon the Swarm Like Agent Protocol in Python (SLAPP) tool, a recent
implementation of the standard Swarm function library for agent based
simulation (www.swarm.org), wrote using Python (www.python.org), a
powerful and simple language: the result is very useful also in a didactical
perspective. CTs at present are running only in Swarm, but a SLAPP version
is under development. A control implementation of the SLAPP code has
been also introduced, using NetLogo (http://ccl.northwestern.edu/netlogo/).
Keywords: Artificial Neural Networks, Reinforcement learning, Innovation,
Agent Based Simulation, Swarm protocol.
1 Introduction to the ERA (Environment, Rules, Agents)
and CTs (Cross Targets) schemes
To evaluate the consequences of the different behavioral schemes adopted by
agents from the point of view of the innovation and complexity framework, we
propose here two original structures that are useful to build agent based simulation
models in a standardized way, adding a third classical structure, that of the
reinforcement leaning.
The first one (ERA) is related to the structure of the models; the second one
(CTs) to the possibility of using Artificial Neural Networks to regulate agent
2
behavior in a plausible way. We introduce also a straightforward protocol, coming
from the Swarm heritage, to code this kind of models using Python, or using other
modern simple object oriented languages, such as Ruby.
The main goal is a didactic one: to expose undergraduate and graduate students
to the construction of agent based simulation models using a simple low level
language, in order to avoid any black box effect; but we have also a second goal,
that of promoting openness of this kind of models, to make replication simpler or
simplify direct use and re-use.
1.1 The ERA scheme
The building process of agent based simulation models needs some degree of
standardization, mainly when we go from simple models to complex results.
Fig. 1. The Environment, Agents and Rules (ERA) scheme.
The main value of the Environment-Rules-Agents (ERA) scheme, introduced in
Gilbert and Terna (2006) and shown in Fig. 1, is that it keeps both the
environment, which models the context by means of rules and general data, and
the agents, with their private data, at different conceptual levels. With the aim of
simplifying the code design, agent behavior is determined by external objects,
3
named Rule Masters, that can be interpreted as abstract representations of the
cognition of the agent or, practically, as its “brain”. Production systems, classifier
systems, neural networks and genetic algorithms are all candidates for the
implementation of Rule Masters. We also need to employ meta-rules, i.e., rules
used to modify rules (for example, the training side of a neural network). The
Rule Master objects are therefore linked to Rule Maker objects, whose role is to
modify the rules mastering agent behavior.
Agents may store their data in a specialized object, the DataWarehouse, and
may interact both with the environment and other agents via another specialized
object, the Interface (DataWarehouse and Interface are not represented in Fig. 1,
having a simple one to one link with their agent.
In the case of the reinforcement learning techniques, typically the creation of
the rules is made off line, by a sort of stand alone rule making step; then the
learned solution will be used to run the model, as an efficient Rule Master;
obviously intermediate solutions exist.
Although this code structure appears to be complex, there is a benefit when we
have to modify a simulation. The rigidity of the structure then becomes a source
of clarity.
1.2 The CTs scheme to use neural networks
To develop our agent based experiments, we introduce the following general
hypothesis (GH): an agent, acting in an economic environment, must develop and
adapt her capability of evaluating, in a coherent way, (1) what she has to do in
order to obtain a specific result and (2) how to foresee the consequences of her
actions. The same is true if the agent is interacting with other agents. Beyond this
kind of internal consistency (IC), agents can develop other characteristics, for
example the capability of adopting actions (following external proposals, EPs) or
evaluations of effects (following external objectives, EOs) suggested by the
environment (for example, following rules) or by other agents (for examples,
imitating them). Those additional characteristics are useful for a better tuning of
the agents in making experiments.
Economic behavior, simple or complex, can appear directly as a by-product of
IC, EPs and EOs. With our GH, and hereafter with the Cross Target (CT) method,
we work at the edge of Alife techniques to develop Artificial Worlds of simple
bounded rationality adaptive agents: complexity, optimizing behavior and
Olympic rationality can emerge from their interaction, but they do not belong
directly to the agents.
The name cross-targets (CTs) comes from the technique used to figure out the
targets necessary to train the Artificial Neural Networks (ANNs) representing the
adaptive agents that populate our experiments.
4
Both the targets necessary to train the network from the point of view of the
actions and those connected with the effects are built in a crossed way reported
here in Fig. 2 (indeed, “Cross Targets”). The former are built in a consistent way
with the outputs of the network concerning the guesses of the effects, in order to
develop the capability to decide actions close to the expected results. The latter,
similarly, are built in a consistent way with the outputs of the network concerning
the guesses of the actions, in order to improve the agent's capability of estimating
the effects emerging from the actions that the agent herself is deciding.
Fig. 2. The Cross Targets (CTs) scheme.
The method of CTs, introduced to develop economic subjects' autonomous
behavior, can also be interpreted as a general algorithm useful for building
behavioral models without using constrained or unconstrained optimization
techniques. The kernel of the method, conveniently based upon artificial neural
networks (but it could also be conceivable with the aid of other mathematical
tools), is learning by guessing and doing: the subject control capabilities can be
developed without defining either goals or maximizing objectives.
We choose the neural networks approach to develop CTs, mostly as a
consequence of the intrinsic adaptive capabilities of neural functions. Here we
will use feed forward multilayer networks.
Fig. 2 describes an artificial agent learning and behaving in a CT scheme. The
AAA has to produce guesses about its own actions and related effects, on the
basis of an information set (the input elements are I1,...,Ik). Remembering the
requirement of IC, targets in learning process are: (i) on one side, the actual
effects - measured through accounting rules - of the actions made by the simulated
subject; (ii) on the other side, the actions needed to match guessed effects. In the
5
last case we have to use inverse rules, even though some problems arise when the
inverse is indeterminate.
A first remark, about learning and CT: analyzing the changes of the weights
during the process we can show that the matrix of weights linking input elements
to hidden ones has little or no changes, while the matrix of weights from hidden to
output layer changes in a relevant way. Only hidden-output weight changes
determine the continuous adaptation of ANN responses to the environment
modifications, as the output values of hidden layer elements stay almost constant.
This situation is the consequence both of very small changes in targets (generated
by CT method) and of a reduced number of learning cycles.
The resulting network is certainly under trained: consequently, the simulated
economic agent develops a local ability to make decisions, but only by
adaptations of outputs to the last targets, regardless to input values. This is short
term learning as opposed to long term learning.
Some definitions: we have (i) short term learning, in the acting phase, when
agents continuously modify their weights (mainly from the hidden layer to the
output one), to adapt to the targets self-generated via CT; (ii) long term learning,
ex post, when we effectively map inputs to targets (the same generated in the
acting phase) with a large number of learning cycles, producing ANNs able to
definitively apply the rules implicitly developed in the acting and learning phase.
A second remark, about both external objectives (EOs) and external proposals
(EPs): if used, these values substitute the cross targets in the acting and adapting
phase and are consistently included in the data set for ex post learning. Despite the
target coming from actions, the guess of an effect can be trained to approximate a
value suggested by a simple rule, for example increasing wealth. This is an EO in
CT terminology. Its indirect effect, via CT, will modify actions, making them
more consistent with the (modified) guesses of effects. Vice versa, the guess about
an action to be accomplished can be modified via an EP, affecting indirectly also
the corresponding guesses of effects. If EO, EP and IC conflict in determining
behavior, complexity may emerge also within agents, but in a bounded rationality
perspective, always without the optimization and full rationality apparatus.
1.1 The reinforcement learning algorithm
CTs gave useful results in other domains, as in Terna (2002) and will be used here
to obtain further results.
To start the exploration of the field we need a more direct tool, like
reinforcement learning.
We can introduce it as follows, referring to Sutton and Barto (1998) for the
methodological basis.
We have
1. a set of states S, related to an environment;
6
2. a set of possible actions A;
3. a set of scalar rewards, in �.
At any time t we have an agent in a state st of S and we can chose the action a in
A(st). After the action the agent will be in st+1 with a reward rt+1. Rewards are
summed over time with a discount rate factor. Our agent develops the capability
of mapping all the possible actions A in a state S to all the related rewards.
In our case the map is based on a set of neural networks, in some way
simplifying the CTs perspective, but operating in the same direction.
2 Using our schemes in SLAPP
Our proposal is that of using SLAPP, that can be found at
http://eco83.econ.unito.it/terna/slapp, to develop this kind of models. The basic
code demonstrates that we can implement a rigorous protocol like that of Swarm
(http://www.swarm.org) with a simple coding system, consistently with the goals
exposed in the premise. At the same SLAPP web address the Chameleons
application may also be found, both in the SLAPP and in the NetLogo versions.
2.1 The Chameleon metaphor to play around with the epidemic of
innovation: the basic tool, implemented by reinforcement learning
The metaphorical model we use here is that of the changing color chameleons
and, more precisely, that of the Taormina’s Chameleons1.
In the starting phase we have chameleons of three colors: red, green and blue.
When two chameleons of different colors meet, they both change their color,
assuming the third one. If all chameleons get the same color, we have a steady
state situation. This case is possible, although rare.
The metaphor is interpreted in the following way: an agent diffusing innovation
(or political ideas) can change itself through the interaction with other agents: as
an example think about an academic scholar working in a completely isolated
context or, on the contrary, interacting with other scholars or with private
entrepreneurs to apply the results of her work. On the opposite side an agent
diffusing epidemics modifies the others without changing itself: we will introduce
also hyper-chameleons, able in doing that. The simple model moves agents and
changes their colors, when necessary. But what if the chameleons of a given color
want to preserve their identity?
1 I have to thank Riccardo Taormina, an undergraduate student of mine, for developing this kind of
application with great involvement and creativity. Many thanks also to Marco Lamieri, a former PhD
student of mine, for introducing the powerful chameleon idea.
7
Going back the reinforcement learning, the states of the world s here are
represented by the 5×5 matrixes related to the different spaces, like that of Fig. 3a,
defined in a bounded rationality perspective, being the agent unable to see a larger
portion of its world.
(a) (b)
Fig. 3. (a) The bounded rationality view of the agent2 and (b) its possible moves.
In Fig.3 (a), from the point of view of the central red chameleon, the “enemies”
are contained in the cells containing 1 in the numerical matrix; the possible moves
are in part (b) of Fig. 3.
The evaluation of the rewards is very simple: considering only a 3×3 space, as
in Fig.4, the red chameleons in the central position of the red square has three
enemies around it; moving to the north in the center of the green square, it would
have only two (visible) enemies, with a +1 reward; the reward can also be
negative.
Fig. 4. Two different situations: being in the center of the red or of the green square.
We are here always considering two steps jointly, summing up their rewards,
without the use of a discount rate. The double step analysis compensates the
highly bounded rationality applied both to the knowledge of the word (limited in
each step to a 5×5 space) and to the evaluation of the rewards, on a 3×3 area.
We have 9 neural networks, one for each of the possible moves in the 3×3
space, with 25 input elements (the 5×5 space), 10 hidden elements and a unique
2 We know that the small pictures do not represent chameleons, but geckos; we will correct the aesthetic side
of the model in the next future …
8
output, which is a guess of the reward that can be obtained choosing each one of
the 9 possible moves in presence of that specific situation of the 5×5 space. The
reward here is the moving sum of the rewards of two subsequent moves.
The effects are shown in Fig. 5: on the left we have all the chameleons moving
randomly in our space; on the right, the conservative (red) chameleons adopting
the reinforcement learning technique to avoid contacts, are capable of increasing
in number, with the strategy of decreasing their movement and of getting close to
their similar.
Fig. 5. The random chameleons on the left and the conservative ones (red) on the right.
As a consequence, inactivity and closeness in a group may be interpreted as a
way to avoid the diffusion of new ideas or innovating behaviors, in our case from
the groups of the green or blue agents.
2.2 The behavior of Chameleon with simplified fixed rules
With the goal both of verifying our results with a completely different code and of
investigating the consequences of fixed rules, the basic model has been duplicated
with a NetLogo (http://ccl.northwestern.edu/netlogo/) program.
The moves are here random as before or mechanic ones: (i) do not move or (ii)
go closer to you similar, with a direct movement; closeness can be defined to be a
near closeness or to be absolute. In all the cases, the ability of considering in a
bounded rationality way the area around each subject is lost and all actions are
strictly mechanic.
In Fig. 6a the normal random behavior gives variable numbers of colors, as
before. In Fig. 7 the red chameleons do not move and take some advantage, but
quite limited.
9
In Fig. 7a we see that staying close without intelligence does not give any
advantage to the conservative group, which is anyway moving around for the
internal interactions (some hope for innovation?); the advantage arises only from
absolute closeness, that gives also the by-product of immobility. (Is that the
strategy adopted by academia to avoid interaction with other cultures and the
related changes?)
(a) (b)
Fig. 6. (a) The random chameleons again; (b) the winning red chameleons do not move.
(a) (b)
Fig. 7. (a) The red chameleons looking for their similar, going near close; (b) the same
looking for absolute closeness (all the red agents are in the same place).
We can underline the difference between the more sophisticated reinforcement
learning behavioral rules (see above and below) and this fixed rules machineries
guiding our agents.
2.3 Introducing Hyper-Chameleons
We now consider the introduction of more sophisticated agents: the hyper-
chameleons, with (i) the possibility of using simultaneously more than three type
10
of agents with different colors, (ii) the presence of agents avoiding contacts (as
before, the conservative one) now named “runners” and (iii) that one of agents
looking for contacts, named “chasers”; the last ones are able to change their world
and can be considered to be the innovators.
All the combinations of colors are now possible: if A hits B the resulting color
can be C (or D or F … following probabilistic schemes), but also A can keep its
color, B can keep its color, both can adopt the A or the B color, with the
possibility of setting these conditions directly in the interface of the model. The
brains of the agents are developed as before, with two different types (chaser and
runner) and with the possibility of developing and adopting other brains, changing
the Rule Masters as introduced in Fig. 1.
We present below a few starting cases and introduce a dictionary: adaptation =
changing color; chaser = using intelligent rules of chasing, to diffuse innovation
or new ideas, with adaptation; runner = conservative agent using intelligent rules
to avoid innovation and new ideas, with adaptation; super-chaser = using
intelligent rules as before, but without adaptation; zero intelligence = acting
randomly, with or without adaptation. Remember that in our scheme an agent
diffusing innovation (or political ideas) can change itself via the interaction with
other agents; an agent diffusing epidemics does not change itself.
-10
0
10
20
30
40
50
60
70
0 50 100 150 200 250 300
Fig. 8. Brute force, without any adaptation, in a classic epidemic way.
Case 1 [red = zero intelligence without adaptation; green and blue = zero
intelligence with adaptation]: this is a brute force application, without any
adaptation for the red agents, in a classic epidemic way, which is obviously
unrealistic in a process of innovation and of spreading new ideas in a complex
system like the contemporary world.
We have here three breeds: the red chameleons moving randomly, but always
changing their color to red (always conserving their color); green chameleons
moving randomly and changing to red or blue with the same probability; blue
chameleons moving randomly and changing to red or green with the same
probability. As you can see in Fig. 8, the red chameleons are colonizing the world.
11
Case 2 [red = super-chaser; green and blue = zero intelligence with
adaptation]: red chameleons are super-chaser, adopting intelligence (using a
“chaser brain”) to act in a classic epidemic way, without any adaptation; the
scheme is again unrealistic if adopted in a process of innovation or of spreading
off new ideas in a complex contemporary world.
-10
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120 140
Fig. 9. A super-chaser, adopting intelligence (using a “chaser brain”) acting in a
classic epidemic way.
We have always three breeds: the red chameleons moving as chaser and always
changing their color to red (always conserving their color); green chameleons
moving randomly and changing color to red or blue with the same probability;
blue chameleons moving randomly and changing color to red or green with the
same probability. As you can see in Fig. 9, the red chameleons are again
colonizing the world.
Case 3 [red = chaser; green = runner; blue = zero intelligence with
adaptation]: chasers and runners both acting with intelligence and adaptation,
with the conservative behavior winning. The first ones are using a chaser brain,
but now changing their color; the second ones use a runner brain to avoid
innovation and ideas, changing their color.
Three breeds: the red chameleons moving as chaser and changing color to green
or blue with the same probability; the green chameleons moving as runner and
changing color to red or blue with the same probability; blue chameleons moving
randomly and changing to red or green with the same probability. As you can see
in Fig. 10, the conservative intelligent behavior of green chameleons wins.
Case 4 [red = super-chaser; green = runner; blue = zero intelligence with
adaptation]: super-chasers and a runners both acting with intelligence, but the
super-chasers without adaptation; is that a necessary condition, practically
impossible to adopt in a modern complex world, to diffuse innovation and new
ideas?
12
-10
0
10
20
30
40
50
60
70
0 200 400 600 800 1000 1200
Fig. 10. The conservative intelligent behavior of green chameleons wins.
-10
0
10
20
30
40
50
60
70
0 100 200 300 400 500 600 700 800
Fig. 11. The red super-chaser chameleons win against the conservative intelligent
behavior of green chameleons.
Three breeds: the red chameleons moving as chaser and always changing their
color to red (always conserving their color); the green chameleons moving as
runner and changing color to red or blue with the same probability; blue
chameleons moving randomly and changing to red or green with the same
probability. As you can see in Fig. 11, the red super-chaser chameleons win
against the conservative intelligent behavior of green chameleons.
2 Further steps and improvements
We have now to play around with our models, experimenting the consequences of
a lot of possible initial hypotheses, as in a artificial laboratory; we have also to
add a full CTs strategy, with learning neural networks building internally their
targets; we can verify the consequences of adopting a larger area of knowledge
and of movement, also relating these different dimensions to the presence or
absence of some form of cooperation and of social capital.
13
Finally, easier ways to show our results possibly can be found through the
adoption of the more recent tools, able to produce simulations with an appearance
close to that of a videogame.
Finally, to adopt more easy ways to show our results we have also to investigate
the possibility of rendering our simulation via the more recent tools able to
produce simulation with an appearance close to that of a videogame.
We are looking in particular to programs like (i) StarLogo TNG (TNG = the
next generation), at http://education.mit.edu/starlogo-tng/, coming from Logo and
StarLogo, or (ii), from the same institution and with similar characteristics, the
newborn Scratch, at http://weblogs.media.mit.edu/llk/scratch/.
References
Gilbert N. and Terna P. (2000), How to build and use agent-based models in
social science, in «Mind & Society», 1, 1, pp. 57-72.
Sutton R. S. and Barto A. G. (1998), Reinforcement Learning: An Introduction,
Cambridge MA, MIT Press.
Terna P. (2002), Cognitive Agents Behaving in a Simple Stock Market Structure,
in F. Luna and A. Perrone (eds.), Agent-Based Methods in Economics and
Finance: Simulations in Swarm. Dordrecht and London, Kluwer Academic,
pp.188-227.