An analysis of noisy RAM and neural nets

Physica D 34 (1989) 90-114 North-Holland, Amsterdam

AN ANALYSIS OF NOISY RAM AND NEURAL NETS

D. GORSE Department of PsycholoD,, University College, London, UK

J.G. TAYLOR Department of Mathematics, King's College, London, UK

Received 21 January 1988 Revised manuscript received 2 August 1988 Communicated by R.S. MacKay

Identity is shown between the dynamical evolution of a net of noisy RAMS (random access memories) and one of noisy neurons, under certain assumptions on the latter. The dynamical evolution of such noisy nets is investigated both analytically and by computer, and found in the generic case to evolve rapidly to a unique stable fixed point for the RAM/neuron output probabilities.

I. Introduction

The development of a connectionist approach to artificial intelligence [1] has lead to a resurgence of interest in the properties of networks of various sorts of logic elements. Those go back to the decision elements of McCulloch and Pitts [2] who initiated the analysis of the logic:.l capabilities of neural networks of such binary decision elements. Various modifications of these elements have been taken into account, such as their probabilistic nature, in order to account for noise at synapses and general fluctuations; in addition the convergence of the net activity to some set of final states has been discussed [3]. This particular aspect, and how these final states and their basins of attraction may be modified (" training" algorithms) has become of great practical interest [3, 4].

An alternative logic element to the probabilistic decision element is the probabilistic logic node (PLN), based on giving a noisy output to a RAM. These have been introduced recently by Aleksander [5], and training properties of nets composed of PLN's analysed. One of the purposes in this paper is to expand that analysis, and in particular by extending a PLN to a PRAM (probabilistic RAM) develop a general mathematical framework in terms of which PRAM nets can be analysed. This is not only of interest in its own right but is of relevance to the analysis of neural nets, since a RAM may be regarded as the ultimate non-linear decision element, in which synapse-synapse interactions are occurring (in neuronal terms).

An important feature of recent developments in PDP (parallel distributed processing) work is the t~ossibie hardware implementation of parallel processing ideas. This may be regarded as the next phase in the evolution of the digital computer, a technology which was initially stimulated by the results of [2] and similar ideas. However there is presently a gap between PDP models which are realisable in hardware and semi-realistic representations of neuronal activity. As a small step towards filling that gap, we wish to show here that noisy RAM nets can give a hardware realisation of noisy neural nets. More specifically we will

0167-2789/89/$03.50 © Elsevier Science Publishers B.V. (North-Holland Physics Pubi~shing Division)

D. Gorse and J.G. Ta3,.'ior / An analysis of nois), RAM and neural nets 91

prove that the equations describing the dynamical evolution of both sorts of noisy nets are identical. This identification allows us not only to transfer concepts and results valid for noisy RAM nets to give insights into the behaviour of noisy neural nets, but also to lead to the hope that noisy neural nets can be modelled physically by noisy RAM nets. Moreover if the former are a realistic model for living neural nets then it may even be possible to design noisy RAM nets which display a modicum of intelfigent behaviour.

The above possibilities naturally lead to the need to determine the general form of the d y n a m i ~ evolution of the noisy nets representing either a net composed of noisy RAMS or equivalently of noisy neurons. The third purpose of this paper is to describe what this behaviour is in general terms. We ~11 do this by use of both analytic and computing techniques. The ana l~c techniques only give useful results for a small range of values of the parameters of the noisy nets, or for small numbers of RAMs or neurons. The insights obtained in this manner are combined ~ t h those resulting from computer iterations of the equations for larger numbers of twits. The computer simulations use randomised values of the parameters and initial points, so as to give as large a coverage of the parameter and initial value space as possible. Naturally we are most concerned with bifurcations which may be the initial stages of the period doubling or attractor circle roads to chaos [61. Our analytic and computer results taken together give a strong indication that chaos is completely aosent from the dynamical evolution of these noisy nets. The generic behaviour seems to be convergence to a unique stable fixed point, and there appez~'3 to be only this one stable mode available to the system. Moreover convergence to this fixed point is extremely rapid, on average within three or four iterations. Only on the botmdary of the allowed region of parameters and initial points has there been seen any other kind of behaviour, where cycles of length bounded by N, the maximum order of the RAMS in t,J~ net, can occur.

The contents of the paper, which are an expansion of ~he bdef report [6a], are as follows. We commence, in the next section, with a description of the single RAM and its extension to the PRAM; the equation of motion for the latter will be derived. In the following section the equations of a PRAM net will be given. Section 4 is devoted to describing the equations for a noisy neural net, and proving the identity of such nets and PRAM nets. We then give a general discussion of the dynamical behaviour we expec~ from such nets. Detailed results on their dynamical behaviour are obtained for 1-PRAM nets in the following ~ection and for 2-PRAM nets in sect:.on 7. Computer simulations are described in section 8 for the case of a net of 3 2-PRAMS and 4 3-PRAMS, as weP as comparisons with n randomly connected N-PRAMS for n = 20 and 50 and N = 2, 3 , . . . , 10. This allows us to deduce general results on the time development of these noisy nets, which are stated and discussed in section 9.

2. The Single RAM/PRAM

An N-RAM has N binary inputs, so 2 N addresses, each of which can give a binary output if activated. Thus a RAM defines 2 z~ possible functions (compared with 2 '~ for a binary decision neuron). Let the inputs be denoted by the N-component binary vector i = (i S . . . . . i~,). Let a,, be binary digits, and ~x denote the 2U-ctmponent binar 3, vector,

a = (,xl, ~ 2 , . . . , ~2";). (2 .1)

Let F,, denov, the set of 2 2` possible Boolean functions performed by an N-RAM. Then the output of an

92 D. Gorse and J. G. Tavlor / An analysis of noisy RAM and neural nets

(a )

(b)

a d d r e s s * 0 or 1output terminals

0 m I ml I I

I

'

Fig. 1. (a) The N address terminals of an N-ILAM, with a single output. (b) The set of 2 N possible input addresses for an N-RAM (one address for each binary input on the N address terminals) and the associated 2 2s binary memory functions, denoted by m, at the addresses.

N - R A M is

O = F(i)= ~.a,,Fa(i ), (2.2)

where % is equal to 1 for some a, and zero otherwise. In other words, the output (2.2) of the RAM is the particular Boolean function F,~, of the i n p u t i, where a , ---8,,,, 0. These general features are shown in fig. 1.

Thus for the case N = 1, 2

F,,,,,,(x) = c~,.~ + a2x,

F.,.., ,~,,,( x ) = a1.~1.~ z + ~2.~1x2 + a3Xl)C 2 ff4XIX2 , (2.3)

where we have used the convenient notation .~= ( 1 - x) (this will not be confused with complex conjugation, since it will only be used for real variables). The possible values of the output of F=, for various binary, inputs, are therefore the different values of components of the binary vector a. These outputs are shown in fig. 2 for N = 1, 2.

{a} N = I X

0 1

FOO Fo~ Ro F.

0 0 1 1 0 1 0 1

{b) N=2 x~x21Fnnnn %o: F~,-,.,~F~.,~F.,y,,-,F~,-,, F~.,,.F.,,,,.F,,...,E,,..,F..,,̂ E... L.E...E.._E... - - . . . . v v , v v , v v ~.,'s,.*xJ ~ . ' V l W V l V ! I V U I V I I U IUi~, . I I I U U U I I I l u l l I l g l ] 1 I V ] 1 1 ]

O0 01 1 0 11

0 O O 0 I O O I 0 I I 0 I 1 1 I 0 0 O I 0 0 I O I O I I 0 I I I 0 0 I 0 0 1 0 0 1 1 O 1 1 0 1 1 0 I O 0 O I I I O O 0 I I I 0 I

Fig. 2. (a) The four possible memory, functions for a 1-RAM. (b) The sixteen possible such functions for a 2-RAM.

D. Gorse and J.G. Taylor~An analysis of noisy RAM and neural nets 93

It is straightforward to extend the N-RAM to an N-PRAM (the extra P denoting "probabilistic") by allowing noise on the output function of the N-RAM. There will therefore only be a proba~;lity, not a certainty, associated with obtaining a 0 or 1 at a given address in the RAM. Such noise rrAght be the same for all addresses, but the most general case would be that for each binary input i the output is one with a probability p~. Thus there is a separate probability of t,btah~ag oac for ~ c h of the 2:" addresses of the RAM.

The PRAM extends the PLN of Aleksander [5], who only allowed Pi to be 0, ½ or 1. We are here generalising this situation, so that all of the possible outputs of the PRAM are noisy, with a continuous range of 1,~vds of noise. This is analogous to the noisy neuron, for which there is also a continuous range of probabilities of firing. It is assumed that the p~'s can be modified by some teaching procedure so as to obtain maximal benefit from the noise, along the lines of ref. [5]; we will discuss the problem in more detail later. The deterministic limit of the PRAM will be given by the function F,o (i) above, whose values are the now binary values of the function Pa, for the 2 N values of the binary input i.

In the noisy case, it is natural to take the input also to be noisy. Thus let (xt, . . . . x N) denote the probabilities associated with the inputs, i.e. x~ is the probability of one on the ith input. Then the probability of a one on the output is given by combining the probabilities of the events of an input, with a certain binary content, and the obtaining of 1 from this input. If now a denotes an N-component binary vector for this input, the probability of such an input will be

N

Pa = l"I (x, x, ). (2.4) i '=1

The probability of I as output is therefore

P= ~,p~P~,, (2.5) t i t

where the summation in (2.5) is over the 2 N possible such binary inputs a = {(0, 0 . . . . . 0), . . . , (1, 1 . . . . . 1)}. The r.h.s, of (2.5) is clearly a probability, with 0 < P < 1, hence the natural arena for PRAMs is aot

binary logic but the analysis of polynomial maps of [0, 1] N ~ [0, 1]. Such maps will be discussed more fully when we consider PRAM nets in the next section.

3. PP, AM Nets

In this and the following section we will set up the ;ramework within which the main result of this paper, the equivalence of noisy RAM and neural nets, can be established. This will involve a ,:a.'cf:A examination of the properties of the two types of network. In order to motivate such a detailed discussion we ,,,;1~ state here our main result:

The dynamical behaviour of any net of noisy neurons, satisfying assumptions (1) to (6) of the next section is identical to that of some net of noisy RAMs, and conversely. The significance of this is that noisy RAMs may be used to mimic physiological nets in hardware; the identity may permit the construction of PRAM nets which may display a degree of intelligent activity. We now turn to a description of the mathematics of PRAM nets.

94 D. Gorse and J.G. Taylor~An analysis of noisy ILl M and neural nets

In any network there is a distinction between the input and output lines on the nodes (the external lines for which only one end is attached to a node, and those connections between the nodes of the net (the internal lines). In the case of a neural network in which activity on a neuron is linearly summed there is no restriction on the sets of input, output or internal lines. This is not so for a net of RAM's or PRAM's since each N-RAM can only have N inputs. One can set up RAM's with additional teaching inputs, but the purpose of teaching is to have only a transitory effect, and long-term activity is not affected. The above restrictions are not so important if N-RAM's with different N are being joined together (though still present constraints), but we will not consider these inhomogeneous nets here in any detail.

Thus an N-RAM network is composed of a set of N-RAMS, whose outputs may bifurcate any finite number of times so as to become inputs of other RAM's or themselves, and which have an additional set of input and output fines. The connectivity of such a net is therefore defined by a set of R nodes, joined by directed lines. There is always one outgoing line and N incoming lines at each node; each outgoing line from a node may bifurcate repeatedly without change of message.

Small RAM nets for N = 1 and 2 are shown in fig. 3. Figs. 3(a) and (b) consider the ease N = 1, where the dashed line in (b) denotes a sequence of further N = 1 RAMs. We note that for N = 1 only these two possibilities (a) or (b) occur; N = 1 networks will be discussed in more detail later (section 6). In case (b) outputs can arise from each of the 1-RAMS not shown explicitly, as also in the other cases in fig. 3 (even from explicitly shown RAMs). In (c), 2-RAMs are joined together without any self-coupling, whereas in (d) this self-coupling must occur for there to be activation of all of the inputs. (e) is the N - 2 analogue of (a); the basic unit is a pair of two 2-RAM's joined in all possible ways to the preceding and the succeeding pair. This can evidently be extended to repetitions of groups of N N-RAMs joined in all possible ways. (f) shows how repetitions can occur with a sequence of inputs; the generalisation of this to higher N is clear. Finally an inhomogeneous net is shown in (g), whose extensions to higher N are also clear.

It is useful to introduce the connectivity matrix of a given node, which we denote by C tr) for the r th node. This is an N × R matrix whose only non-zero entries are the units corresponding to the inputs; C, t f ~ = 1 if the ,:!h input line receives activity from the j t h RAM. Thus the input of the ith line is (c(r)x)i if the output of the RAM's is x = (x~ . . . . , xR) (taken in a given order, identical for all the RAM's of the net). Thus for tlie cyclically connected net of fig. 3(c),

C (1) 0 1

0 0

0

1 C (2) =

0 0

1 0

1

0

C (3) =

1 0 0

0 1 0 (3.1)

In the case of external input i (a single input i r o . the r th RAM), C (r) will be an ( N - 1) × R matrix whose absent row corresponds to the input.

We may use the connection matrices to construct the equation of motion for the associated net as

•

x ' " ; ( t + i)=/ '- ' (r) C ( r ; x ( t ) ) , (3.2)

where time has been discretised and F tr~ is the appropriate iogic function for the r th RAM, of form (2.2) (the validity of the use of discrete time will be discussed in section 9). This allows analysis of the development of the state of the net in time by iteration. Since the state space (the space of binary vectors of length R) is finite with 2 R points, then all motions are either limit points or limit cycles of finite size. If R is large then these cycles may contain a very rich structure, and the program of CaianeUo ana co-workers [7] over a decade (for deterministic nets), and the efforts of other workers (for RAM nets) [8]

D. Gorse and d.G. Taylor~An analysis of noisy RAM and neural nets 95

(b}

Ic}

(d)

{el

If)

Ig}

Fig. 3. (a) The basic unit for a chain or ring of 1-RAMS with only feed-forward connections. (b) A feedbdck loop for I-RAMS, with bifurcation of the outputs. The dashed line denotes a chain of 1-RAMS. (c) A net of three 2-RAMS with,mr self-feedback. (d) A net of two 2-RAMS with self-feedback; the dynamical equation for this is discussed in section 7. (e) The baste unit for a chain or ring of 2-RAMS with only feed-forward connections. (f) A unit of a chain of 2-RAMS with external inputs on each of them. (g} An inhomogeneous net composed of a 1-RAM and a 2-RAM; the dynamical equation for this is considered in section 7.

96 D. Gorse attd J.G. Taylor / A n anai.vsis o/noisy RAM and neural nets

lal

IbD

@ Fig. 4. (a) The state diagram for the net of fig. 3(c) when each of the 3-RAMs performs the function AND. (b) Same as (a) except RAMs 1 and 2 perform the function F01oo and RAM3 performs FuKx~.

has shown some of this complexity. A typical state diagram for fig. 3(c), is shown in fig. 4: (a) when all RAM perform AND, and (b) when 1 and 2 are F0t0o whilst 3 is Flo0o.

Our purpose here is to extend (3.2) to the PRAM. To do that we must delineate the state space of the system. The natural extension of the deterministic case would be to take the same 2R-dimensional state space, described by R-component binary vectors x, and take an associated probability P~ for each such state. This state probability should be part of any probabilistic description of a PRAM (or noisy neural) net. However in both cases care must be taken in developing the equations of motion extending (3.2) into the probabilistic regime. For underlying the probability p~ on each state of the net will be a probability u, for each neuron or PRAM being on (firing). Then

N

P,~ I - I u .~, , - , , (3.3)

which is the net analogue of (2.4). Since there are only R quantities u, but 2 R quantities Px then there are expected to be (2 R - R - 1) identities between the set of Px's, other than the trivial

E P x = 1. (3.4) x

These have been considered in detail elsewhere [9], where it has been shown that there are RC r identities of degree r in the Px's for r ~: 0, i (when (3.4) is excluded).

In particular for R = 2 there is the single quadratic identity

PooP,~ = Po, P ,o , (3.5)

whilst for R = 3 there is the cubic identity

P~ll = ( Ptlo + Pill)( Pt0t + Pill)( Poll + Pt, , ) (3.6)

D. Gorse and J.G. Taylor/An anah'sis of noisy RAM and neural nets 97

and three similar quadratic identities, one of which is given by

(t'oo + + e o . + e. )(eo o + e . o + e,o, + p , , , ) = eo,, + e, , , . (3.7)

It has been usual to attempt to construct an equation of motion for the noisy net (PRAM or neuronal) in terms of a Markov chain, so having the form

P x ( t + l ) = M P x ( t ) , (3.8)

where M is a 2Sx 2 a matrix, assumed to be independent of the state probability p~. Such a linear evolution equation allows for a detailed analysis of the time development, using the properties of probability matrices. As Clark has recently emphasized [3], in the neuronal context, there is only one limit point in the motion, so the behaviour does not seem to be rich enough to be of interest. We will discuss the question in more detail in section 9.

The above-mentioned identities indicate the need for caution in applying (3.8), since even though Px(t) may satisfy the identifies, Px(t + 1) given by (3.8) may not do so except for special values of M. This has been analysed in detail elsewhere by one of us [9] for the case of noisy neural nets. We will consider the situation in more detail later for the PRAM nets, but will commence by taking the model PRAM of section 2, and will build the equation for the net in terms of the evolution equations for each PRAM separately. The net equation will be non-linear, in comparison to the linear equations (3)-(8).

Let us consider the rth PRAM, with firing probability u,(t) at time t. Neglecting external inputs, the various inputs to the PRAM are given by the components of [C(')u(t)]. Let these components be denoted by v ( t )= (ol , . . . , vN). Then by (2.4) and (2.5),

N

u,(t + 1) = E p (') I-I (',~"@) a i==1

N

= E p ( ; , i - i t " a i = l

(3.9)

the summation in (3.9) being over the 2 N choices of binary N-vectors a. Thus (3.9) is the resulting PRAM net equation. It depends on the set of R connection matrices C tr) and output-probabilities { p~')}, for 1 < r < R. These may be freely given within the restrictions discussed earlier. The factors in the r.h.s, of (3.9) contain independent probabilities at each time t provided the components of u are given as independent probabilities at some initial time.

A single input to each RAM may be included by restricting the product in (3.9) to 1 . . . . . ( N - 1), and including a factor 'r'"Nt,':~N, where ir is the probability of firing of the input into the r th PRAM. Outputs can evidently be taken as the u(t). Thus the full system of P1L~M net equa'dcns, including input i(t) and output O(t), is

N - 1

l l r ( , Jr 1 ) E p ( - r) H [c(r'.(t)]'[c(".(t)l - - = ia~ia~, ' i i -r "r " a i = l (3.10)

o(t) =

We will concentrate on (3.9) for the rest of this paper, since the structure of (3.10) is similar to (3.9) apart

98 D. Gorse and J.G. Taylor~An analysis of noisy RAM and neural ne t s

from the presence of forcing terms. We note that the inputs may be allowed to bifurcate, so replacing further of the products in the r.h.s, of (3.10); we will not analyse that case further here.

It is now possible to build up the evolution equation for the state probabilities P~ of (3.3) by use of (3.9) or (3.10). The resulting equations will have non-linearities on the r.h.s, of (3.8) replacing the linear term in general, so that Markov chain techniques will no longer apply. Other than for special values of the parameters { p~')} the identities described earlier would contradict such linearity, except for the ease of N ---- ! (for in this case (3.9) is linear, leading to a linear equation of the form (3.8)). Even the N = 2 net of fig. 3(d) can only lead to (3.8) under strong restrictions on the probabilities { pc(')}: it is necessary for the probability vectors p~t) p~2) to take one of the four forms

p:"-- p i', ), , i?)

p:"' - (pg), pg', , ,-oo'(:' )

e ." ' - ( pi?, pi?),

E l l ~ F I I ] '

= F I 1 ' F I I 1 '

p.(:) = (pg' ,

, or ( 1 ~ 2 ) (3.11a)

(3.11b)

(3.11¢)

In general the evolution equations for Px will be a polynomial of degree R in Px on the r.h.s., whilst the equations (3.8) involve polynomials of degree N. Since R > N the simpler equations to analyse appear to be those of (3.9) (or (3.10)), for the individual PRAMs, instead of those for the probabilities on the states of the net as a whole.

We thus conclude that the PRAM net equations for R N-RAMs are polynomial maps of degree at most N of the hyperinterval [0,1] R into itself. The coefficients of the various terms are given by the connection matrices and output probability functions on the addresses of each of the N-RAMs. We will henceforth use the epithet noisy to denote a neural net satisfying either (3.9) or equivalently (4.3).

4. Noisy neural net equations

We wish to develop a system of equations for neurons which have noise inherent in them due to the stochastic nature of the transfer of information at the synapse. There have been various models proposed for t~s, for example those of refs. [10] and [11]; we note that the second of these may suffer from the difficulties of state space description remarked on in the previous section, and thus we turn in preference to the description'of ref. [10]. We will proceed by summarising the assumptions and results of that reference. The assumptions made were the following:

(1) There are N neurons, with arbitrary connectivity, and with total activity at any time t described by the probabilities pj(t) (1 < j < N) that the neurons will fire.

(2) Time is assumed discrete, t = 0, ~', 2~- . . . . . in terms of a smallest unit of time T, related to the refractory periods, synaptic and other delays, etc. for the various neurons. We normalise ~-= 1 without loss of generality.

(3) Neural firing is determined by the total amount of transmitter substance arriving on a given neuron from the other neurons (including itself) synapsing on it, with no spatial or temporal decay of postsynaptic potential over the dendrites or cell body, up to the axon hillock.

D. Gorse and J. G. Taylor~An anal.vsis of noisy ILl M and neural nets 99

(4) Transmitter substance is released into the synaptic cleft in a spontaneous manner. The probability density function for such spontaneous release is denoted ps(q). This was chosen, for simplicity, in [10] to be a Poisson process with mean h = toe c / t~ (tde c is the lifetime for transmitter in the synaptic cleft. ti -t is the frequency of spontaneous release of vesicles into the cleft) and with q0 the amount of transmitter contained in a single synaptie vesicle, so that

Ps(q) ---e-x ]~ (n ! ) -~2~"$(nqo-q) • (4.1) n~O

(5)

Since the parameters h and q0 will, in general, depend on the indices i and j, where j denotes the presynaptic neuron, i the postsynaptic one (and q0 may be negative for an inhibitory effect), p., can be replaced by the more general p}U) to take account of this variability of spontaneous transmitter release on the pre- and postsynaptic cell. The arrival of an action pulse which has travelled from the j t h to the ith cell causes the release of n~, further vesicles into the synapse. No spontaneous release was assumed to occur then, so that the "deterministic" density function for transmitter substance in this case, p,~:)(q), takes the form

pdOJ) (q) = 8( q -- n:,q(o '~' ). (4.2)

We may, still more generally, take a form for p~'J)(q) which takes account of spontaneous vesicle release during arrival of an action pulse at a synapse, as well as for variability of the size of synapses, etc.

(6) The ith neuron fires at a given time (t + 1) if the total amount of transmitter substance (spontaneous, due to leakage through the density function Ps such as (4.1) or excited deterministically through a function Po such as in (4.2)) to larger than some critical value q~, which in general may depend on i. More generally we can assume that the probability of firing on neuron i is determined by a probability distribution plO(q) which depends on the total amount of transmitter substance q entering the neuron at time t.

The above assumptions thus determine the equations governing the time development of the set { p:( t ) } as given in [10]

N

p,(, + 1)= f dqp ')(q) l'-I f dq:, j ' = l

q - ~ q j , j = l ---

pj( t )p~': ) (q: , ) + fi:(t)p~'J)(q:,)]. (4.3)

q set equal to Z~..~qji by the 8-function, where each q j, has as probability density function the j th factor in the final product in (4.3). Each such factor is composed of two terms, a deterministic one. proportional to the product of the probability p:( t ) of firing of the j t h neuron at time t and p,]t:~(q:,), and a spontaneous one proportional to the product of the probability fi:(t) of no firing of the j th neuron at time t arld p}i:)(qji).

We should add here that (4.3) reduces to the deterministic eqaations of Caianello [7] in the limit q~D) ~ O, n,jq~) ~ constant, as already discussed in [10]. Moreover (4.3) becomes linear if the p( t ) << 1. as used by Cooper et al. [22].

100 D. Gorse and J. G. Taylor/An analysis of noisv RAM and neural nets

Fig. 5. A net of two 2-RAMS with self-feedback.

The dynamical equation (4.3) is seen to be of the same form as (3.9), where the constants ptj) for fixed i, are determined by the set of distributions p~.i), p)~j)pt~). Thus in the ease of N = 2, with the connectivity of fig. 5 we have a formal identity between the PRAM.net equation (3.9) and its neuronal

counterpart (4.3):

(4.4a) (4.4b)

with

- f P "(q)dq f dql p~ll)(ql) p~t2,(q _ ql),

= f p[' (q)dq f dql p~'"(ql)p~12,(q _ ql),

~2 = f P~'(q)dq f dql p~ll,(ql)p~t2,(q _ ql),

(4.5)

= f p t'(q)dq f dq, p~'"(q,)ptdl2'(q-- q,).

It seems clear that for any choice of constants pro in (3.9) it should always be possible to choose (2N 2 + N) distribution functions p~J), p ~ , pt ~ so that the constants p ~ are determined in the case N = 2, by equations similar to (4.5), p~') being obtained as the 2 N moments of pt '~ convoluted with the N densities p~'J~ or p~'-J) for 1 _<j < N. It is to be expected that all possible sets of independent constants p~, ~ [0, 1], can be obtained by suitable choices of sets of density functions p~J), p~'J~ or pt ~ (in many ways, in fact). Thus we have established the equivalence of PRAM and noisy neural nets claimed earlier.

This result indicates that the dynamical development of a system controlled by (3.9) is not only of interest but also provides a bridge between a semi-realistic model of living neural nets and physically realisable electronic networks. We will proceed to investigate that possibility in the next section.

5. General behaviour of PRAM nets

In the last decade or more there has been an enormous development in the understanding of dynamical systems, both from the pure and applied viewpoint [6, 12]. Even for one- or two-dimensional systems, polynomial maps, or more generally diffeomorphisms, can have extremely complicated behaviour under

D. Gorse and J. G. Taylor~An analysis of noisy RAM and neural nets 101

iteration; this has been seen both from the mathematical point of view [13], and from the aspect of approximately modelled biological systems [14]. However the general features seem to be that under iteration we may expect flow towards attractor sets in general of dimensions less than that of the whole space. These may be of dimension zero (limit points), one (limit cycles) or fractal (strange attractors). This latter can even occur for a quadratic map of two variables [15], so that we must be prepared to accept ~ l

three possibilities in the case of PRAM/neural networks. (But note that the only one variable case we could have is of a PRAM connected to itself, leading solely to a linear equation, and hence being of the form (3.8).) Moreover the possible attractor sets may change drastlcally as the parameters change, as occurs, for example, in the H6non map [15]

x(t + 1) =y(t) + 1 - ax2(t),

y(t + l) - bx(t),

with the critical value of a being between ¼(1 - b) 2 and 1.55 (for b = 0.3) (H~non took a = 1.4 for b = ~.3 to obtain a chaotic attractor).

The resultant motion towards attractors divides the variable space [0,1] R into a set of basins of attraction with separatrices dividing them. The ability to use the resulting noisy net activity in a connectionist manner thus results from a knowledge of the dependence of the attractors and their basins in [0,1] R on the internal parameters {p~')} and {C(')}. If inputs are regarded as initial values of the PRAM-net activity, or as gi~,en only for a finite time in (3.10), then resulting activity is determined by the attractors in whose basin the noisy net finds itself. The outputs are thus known, by (3.10), so can be used to distinguish between (or classify) the possible inputs. Modification of this distinction will therefore be possible either directly or by teaching inputs specifically designed for that purpose. Interest in training algorithms of various sorts has grown considerably recently [1, 5], but we will not discuss training in any detail here, our primary purpose being to analyse the nature of the attractors and their basins for (3.9),

(3.10). As is immediately clear from the lack of understanding of these problems for even two-dimensional

maps, a great deal of further analysis is necessary. However one criterion is clear, which is based on the general feature of strange attractors, that they give rise to motion extremely dependent on the initial conditions. Such sensitivity is clearly to be avoided. In order to use a net in the manner indicated earlier the following criteria are therefore necessary:

Criterion 1. Regions of the parameter space P of { p(') }, { C (r) } which lead to strange attractors must be avoided. In order to use a noisy net most effectively it would also seem necessary to reduce the size of attractors as much as possible. This may best be done by dividing [0, 1] R into a set of non-overlapping cells, each of which has an attracting limit point. We thus obtain

Criterion 2. Regions of the parameter space leading to the maximum number of attracting limit points are preferred. Since at most N fixed points are to be expected, we must also hope to obtain limit cycles as well, which may approximate the periodic motions (of period up to 2 R) of the noisy net considered earlier

in section 3.

There are two general properties of noisy nets which can be proved reasonably quickly. The first is that there always exists a fixed point of the noisy net equations (3.9). This is so because the r.h.s, of (3.9) defines a polynomial (and so continuous map) of [0,1] R into itself, and so possesses at least one fixed point, by the

102 D. Gorse and J.G. Taylor~An analysis of noisy RAM and neural nets

Brouwer fixed point theorem [16]. The stability of this fixed point is not guaranteed, and cannot be determined without much more detailed analysis. We will return to these questions iv section 9 after analysis of specific cases and a discussion of the computer simulation of various types of networks.

The second property is that for a suitable range of the parameters p~ in (3.9) there is a unique attracting fixed point. This follows from the contraction mapping principle [16]. Thus if we consider the map on the r.h.s, of (4.4), which is a special case of (3.9) for N = 2, for two different iterations u~(t) and u [ ( t )

(~ = 1,2),

lut(t + 1 ) - u ~ ( t + 1)1 < ( lax - aol + la3 + a o - a t - a 2 1 ) l u , ( t ) - u ~ ( t ) l

+( laa-a01 + In3 +ao-al-a21)lu2(t)-u2(t)l (5.2)

with a similar equation for [u2(t + 1) - u~(t + 1)l with the constants a~ on the r.h.s, of (5.2) replaced by fl~. Thus the r.h.s, of (4.4) will be a contraction map in the topology

I1(~, ~ ) II ~ max (lu,(h)l,u2(t2)), ( t t . ta) ~ [0,1] 2

(5.3)

pro~4ded

max { ( l e q - a o l + In2-aol + 21no +an

< 1 .

- a l - a=l), ( l f l , - f lo l + IB2-flol + 21& + ~0-B~ - f121)}

(5.4)

We may choose other topologies on the space of mappings of [0,112 into itself without any real gain over the re?ka (5.4L as far as we can see. A similar result holds for the more general case of (3.9) for general R and connectivity matrices { C t') }. The r.h.s, is a contraction map in one of the topologies on maps [0,1] R to itself provided the set of constants p~'), for a given i, satisfy identities extending those of (5.4) in a uatural fashion. In general these identities require the p t , ) for fixed i and tt ~: 0, to be suitably close to po t~. In other words the values of the p~') are close to those corresponding solely to spontaneous activity. This result is seen to be expected since if pier)'" p~'), V a and each r, then the r.h.s, of (3.9) reduces to p~') and so u;.(t + 1)= p~')= constant. It is to be expected that small variations of the parameters p~r~ would not distmb the existence of the unique fixed point u , ( t ) -- p~r~.

In order to understand the possible motions which may arise from noisy nets governed by (3.9) or (3.10), we will consider special cases in the next section before returning to a general analysis.

6. Soluble noisy 2-nets

(a) The simp!es' .,~,,_t,-;,,;~l ,.~o.~ of . . . . h ...., • O U ~ l I a l i s t i S the ._L . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . hmumvgencous one of fig. 3(g). By change of notation, let a o, nl, a2, u3 be the probability of firing given input (00), (01), (10), and (11) for the 2-PRAM, and fl0, fl~ be the corresponding probabilities associated with inputs 0 and I for the 1-PRAM. If u 2 and u l denote the firing probabilities the 2- and 1-PRAMs, the equations of motion (3.9) become

ul(t + 1) = flo~2(t) + fl,u2(t), (6.1)

D. Gorse and & G. Tailor~An analysis of no~. RAM and neural nets 103

~fiS can be r e ~ t t e n m terms of u~ and u 2 as

+ 1) -bo +

u2(t + 1) --- [a o + alu I + a2u 2+ a3ulu2](t), (6.2)

where

bl = ~1 - rio, b~ ffi ~o,

a3 = ao + a3 - a l _ a2 ' a2 _ a2 _ ao ' al = at _ ao ' ao = ao" (6 .3)

T h e conditions, for probabilities,

0 < a , , f l j < l ( O ~ i ~ 3 , 0 ~ j ~ 1) (6 .4)

lead to the constraints

0 < b o, (b o + b t ) , a o, (a o + at ) , ( a o + a2) , (a o + a t + a 2 + a3 ) ~ I. (6 .5)

If we rewrite (6.2) as

x~+ l = a o + a l y ~ + a 2 x i + a 3 x i Y ~,

yi+l f bo + b lx i , (6.6)

it is possible to reduce (6.6) to a canonical form by the linear transformation

X = A + x , Y = B + y (6.7a)

as the 2-parameter mapping

Xi+I = XX, + I, Yi + a3X, Y,,

y,+a = ittX, ' {6.7b)

where ti = b 1, ~ = a 1 - ( a 3 b o / b l ) , ~i = a 2 - [a3 (a2bo- a o b l ) / ( a 3 b o - albl)]. The fixed points of (6.7b) are (0,0) and ( [ ( 1 - h - r p ) / a 3 1 a ] , ( 1 - 2i-J,/~)). The former of these is never possible, since X + [ A , A + 11,

with A = bo/b 1 >__ 0 unless b 0 = 0. In the contrary case there is just one fixed point of (6.7b). (b) The next non-trivial ease is that of fig. 3(d). With the same notation as in the previous case (a), but

now with ~o, fix, f12, f13 being the probabilities for firing of 2-PRAM 2 on inputs (00), (01), (1O) and (11) (similar to the values a o, al, az, a 3 for the 2-PRAM 1) the equation of motion analogous to (6.1) is

Ul(t + 1) = (Oto~,~ 2 + alUl~ 2 + ~e2~lu 2 + ot3u,u2)(t), (6.8)

u2( t + I) = (/~o~2~i + ~lu2fil + ~2~zul + ~3u2ux)(t) .

We see that this reduces to (6.1) on taking/30 = ~82,/3~ =/~3 and interchanging u~ and u:. We will consider this reduction later. Again (6.8) may be rewritten in a form similar to (6.2) as

u t ( t + 1) = ( a o+ axu ~ + azu 2 + a3u iu2 ) ( t ) , (6.9)

u2( t + 1) = (b o + bxu~ + b2u 2 + b3uxu2)( t) ,

with the a~ given by a, as in (6.3) and the b, given by similar formulae in terms of the/~,, ~4th the

104 D. Gorse and J.G. Taylor~An analysis of nois.v RAM and neural nets

associated constraints

0 < a 0, (a I + ao), (a 2+a0) , ( a 3 + a 2 + a t +a0 ) < 1 (6.10)

and similarly for the b~'s. We note again that (6.9) is a quadratic mapping of two variables with linear Jacobian (slightly more

general than the previous ease (a))

J = [(atb2 - a2bl) + ( a t b 3 - a3bl)ul + (b2a3- b3a2)u2]. (6.11)

The map (6.9) can now again be reduced by a map similar to (6.7a), now leading to the canonical form (for a3, b 3 sO)

(,,) x ' + t = ( a t - t t a 3 ) x ' + ~ ( a 2 - X a 3 ) Y ' + x ' Y "

(6.a2a) Y , ~ ' = ( b t - X b 3 ) Y , + ~.a (b2- laba)x , 4 x,Y,,

with h, / t satisfying

x(a + ao + =0.

#(1 - b t ) + b o + x(lab 3 - b 2 ) = 0 , (6.12b)

which is a 4-parameter map, generalising (6.7b) appropriately, and having 2 fixed points instead of the one of (6.7b). The general case (6.9) appears to be complicated to study in this manner, so instead we will return directly to the parametrisation of (6.8).

The fixed points of (6.9) are given by, for example, substituting

u2 = ( bo + b,ul)/(1 - b 2 - b3ul), (6.13)

obtained by solving the second of the equations in (6.9), into the first of these equations. There results a quadratic equation for u~ of the form

Au~ + Bu 1 + C = O, (6.14a)

where

A = a3b~ + b3(1 - a~),

C= boa z - (1 - b2)a0,

B = boa 3 + bla 2 - aob 3 - (1 - at)(1 - b2) , (6.14b)

and the 2 solutions

ut±= ( 2 A ) - t [ - B ± ~ 5 - 4 A C ] . (6.15)

The related solutions for u 2 may be obtained by substituting (6.15) in (6.13) or more directly by repeating

D. Gorse and J. G Taylor/A n analysis of noisy RAM and neural nets 105

the above procedure on interchanging u~ and u~. This gives the corresponding fixed-point values of u2 as

u2± = ( 2 D ) - a [ - E + ~ E 2 - 4 F D ], (6.16)

where the ± signs in (6.15) and (6.16) go together, and

D = b3a 2 + a3(1 - b 2 ) ,

F f f i a o b 1 - (1 - a~ )b o.

E - aob 3 + a2b 1 - a3b 0 - (1 - al)(1 - b2) , (6.17)

The stability of the fixed point is given in terms of the eigenvalues of the matrix

M = ( al 4" a3tt 2

a 2 + a3u I

bl + b3u2 / -- M( / ~ 2 ) ,

b 2 + b3u 1 1 (6.18)

with values

h~= ½[trM 5: ( ( t r M ) 2 - 4 d e t M ) '/2] = h ± ( u l , u2). (6.19)

The Jacobian det M is only constant in the special case where there is a relation between u1(t + 1) and u2(t + 1): if J of (6.11) is a constant then a l / b 1 = a2/b2 = a3/b3 = a so that from (6.9)

u 2 ( t + 1 ) - a u l ( t + 1 ) = b o - a a o. (6.20)

We may therefore use (6.20) to reduce (6.9) to an equation for u~ alone; we will discuss the resulting equation later.

Further analysis of (6.9) may be achieved by determining the cor~ditions under which both (ul +, u 2 ~ ) and (ul_, u:_) are inside or on the boundary of [0,1] 2. This could be determined analytically from the inequalities (6.10), but we have not found this easy to do. Instead we used a random choice of values of the parameters a,, fl~ ~ [0,1] and determined tl,e resulting values of (u~±, u2±). 10 6 such choices of the parameters always gave

(u,_, u2_)~ [0,112, (u~+, u2+)~ [0,112. (6.21)

These randomisecl parameter values did not lie on tile boundary of parameter space, for wtfich (6.21) may be violated; we will discuss the result of restricting a,, fl, to {0, ~, 1 } later. Continuing with the generic case (6.21), for each randomised choice of parameters the eigenvalues h±(u l_ , u 2_ ) given by (6.19) were c l o t o r m l n o ¢ t I n n ~ a r t i , , . l d ~ r t h o e~c ,~e a f i / ~ r ,~al • ii~ $ ~ . a m n i o v c ' a n i H e a ~ c , c w,~r,~ d l c t i n o l l l c h o , ~ . [r~ ~ . ~

(i) the mapping (6.9) near (ul_, u2_) may be 7~duced to one in one dimension by u~e of (7.13), ~f form

u 1 -* F ( u l ) = a 0+ axu I + (a2 + a3u l ) (b o+ blur)~(1 - b 2 - b3ul). ¢6.,.,.

The derivative F ' (ul_ ) was then determined; (ul_, u 2_) would then be an unstable fixed point if [i3!

]F'(u~_) ] ~ 1. (6 231

106 D. Gorse and J. (7. Taylor / A n analysis of noisy RAM and neural nets

This condition was not satisfied in the cases considered in the computer run. For case (ii), a Hopf bifurcation of the fixed point (ut_, u2_) into an attractive invariant circle would have occurred if

I h ± (uz-, u2-)[ > 1. (6.24)

Again no case satisfying (6.24) was discovered among the 106 trials described above. Thus we conjecture that:

For generic values of the parameters a~,~ in (7.8), the @namical evolution of a 2 x N- - 2 noisy net is always to a unique stable fixed point.

Evidence for the above conjecture has been given by computer methods, but since the inequalities (6.10) form a connected set of non-zero dimension equal to that of the full parameter space, there is no question of fractal-type behaviour entering. In other words if the inequalities (6.23) or (6.24) had been satisfied in some part of the iaterior of the region (6.10), then this should have been observed in our computer simulation.

We turn to non-generic values of the parameters a,, fl~ in (6.8). We do not have a complete specification of all possible parameter's giving non-trivial behaviour of (6.8). However a computer run (on the KQC VAX) was performed using a randomised (with the system utility MTH$RANDOM) choice of parameters a,, #i equal to 0, ½ or 1. Each case was checked for convergence under iteration starting from randomised values of (u t, u2)~ [0,1] 2, A significant number of cases were found where convergence did not occur to 0.001 after 500 iterations. These were found to arise from parameters which gave (u 1 +, u 2 +) = (ul _, u 2_) = (u l, u2)~ {(1,0), (01), (00), (11)}, when the motion for (u t, u2) was ultimately a 2-cycle

(,,,, ~) ,-, (c, u~) (u,,c)~(1-c, u2)

(when u 1=u2) , or (when u , . u2) ' (6.25)

where c depended on the initial values of (.:~, u2). A typical choice of parameters in (6.8) leading to (6.25) corresponds to (6.8) reducing to

ul(t + 1)= fi2(t) + fil(t)u2(t), u2(t+ 1)=ul ( t ). (6.26)

Choices of the parameters which linearise (6.8) can give rise to similar 2-cycles, or even 4-cycles as in the case

, ,~ ( t + 1) = 1 - , , ~ ( t ) ,

, , , ( t + 1) = u , ( t ) , (6:z7)

for which

(Cl ' C2) -'~ ( l -- ¢2' Cl) "-* (~ -- Cl, 1 -- 1"2) "-'¢ (C2, I -- e l ) "* (Cl, C2). (6.28)

However computer simulation has never produced more than a 2-cycle when (u 1 +, ll2 + ) = ( UI- , //2--) is on the boundary of [0,112, except in the linearised cases like (6.27) (when there is only one fixed point). We

D. Gorse and J. G. Taylor~An analysis of noisy RAM and neural nets 1 0 7

conclude that the non-genetic eases can lead to cyclic behaviour, but that since this may be vulnerable to asynchronous up-dating we will not pursue it further here (but discuss it in section 8). The example (a) above has b 2 = b s ---0 in (6.9). It therefore has genetic time evolution given by the unique fixed point theorem. The net will also have cyclic behaviour for non-genetic values of the parameters.

Finally we return to consider the special ease for which det M = constant, which satisfies the conserva- tion law (6.20), and note that since this is a special, but non-boundary, ease of the general 2 x (N = 2) net, the behaviour will come under that of the unique fixed point theorem.

7. Higher nets

It is not possible to use the combined analytic/computer techniques of the previous two sections for more complicated nets owing to the difficulty of obtaining analytic solutions. For example the 3 x (N = 2) noisy cyclic net of fig. 3(c) has dynamical equations (3.9) given by

u,(t + 1) = (aor,~r,~ + ~ e u ~ + ~ u ~ + , , ,u , , , , ) ( t ) ,

u,(t + 1) = ( t ~ o ~ , + t ~ , , , + t~:u,~, + t~.~u,)(t) ,

,,~(t + 1) = ( V o ~ , + v,~,,,, + w,,,,~, + v~u,,,~)(t),

(7.1)

with a i, fli, ~,~ ~ [0,1]. There are other nets obtainable from (7.1) by choosing different connectivity in fig. 3(c), but these arise from (7.1) by a permutation of the parameters, leading to the same form as (7.1). (7.1) may be rewritten as the quadratic map

~,(t + x) = (~ + b.: + o,~ + a~: .~)(t ) ,

Uz(t+ 1) = ( e + ful + gus + h u l u s ) ( t ) ,

u s ( t + 1) = ( l + mu 1 + nu 2 + qulu2)(t).

(7.2a)

(7.2b)

(7.2c)

To find the fixed points of (7.2), u s may be eliminated from (7.2c) and substituted into (7.2a, b), to give the equivalent dynamical mapping

ux(t+ 1 ) = ( A + Bu, +CUE+ Du,u:+ Eu~+ Fu,u~)(t),

' 4 2 1 " I - ~ ' " B l u i ' "-" - z ' - - - l - z . . . . . . . . *

(7.3a)

(7.3b)

I/2 may then be eliminated from (7.3b) to give the quintic equation

(1 - C ' - D X u , - FXu?)E[A - (1 - B)u'~ l + (1 - C ' - O ' u , - F ' u ~ ) ( C + D u , )

+ ( A ' + B',,, + E' , ,?): (E + Fu,) = O. (7.4)

108 1). Gorse and J.G. Taylor~An analysis of noisy RAM and neural nets

(a)

(b)

% q ' . , 1 .m ..1. m , .m ~ a . , ~ . . , - ' m ~ ' m m ' ' ° '1~ ~

Fig. 6. (a) Ten 1-RAMS assembled into 3 rings, of lengths 3, 1, 6 respectively. (b) A single ring of n I-RAMS.

In (7.2) and (7.3) the constants a, . . . . q, A , . . . , F t are all given in an obvious but complicated manner in terms of the a,, B~, y,. (7.4) has no analytic solution for p~ in terms of its coefficients, so we cannot give an analytic discussion of the number and stability of the fixed points.

A similar situation for all higher net equations, such as that for the 4 x (N = 3) cyclically connected net of fig. 7, with

ut( t + 1) = (aoU2UsU , + a1~2~31/4 -~- a2~2/d3~4 + a3//2~3~ 4 4- a4~21/3/A 4 -+- aS/d2~31d4

+ aJA2u3u 4 + o~7u2u3u4)(/),

u2(t + 1) = (floU3UaU l + . . . ) ( t ) ,

u 3 ( / + 1) = (YoU,UtU2 + . . . ) ( t ) ,

u4(t + 1) = (8oUtU2U 3 + . . . ) ( t ) ,

(7.5)

with a,, fl,, y,, 6, ~ [0, 1] for i=O, 1,2,3. In order to see if the results of the previous section can be extended to these higher nets we resorted to computer analysis in toto. This was done first for the cases of the two nets described by (7.1) and (7.5) respectively. The method of randomisation of the parameters a~, etc. described earlier was used here, and for each random choice a random choice of initial values of the u ' s was chosen. 100 iterations were then performed, and the result was noticed (the programme exited) if convergence was not achieved on u~ to 0.001 i.e. if lug(101) - u~(100) I > 0.001. In 50000 such runs, on each of (7.1) and (7.5), no exits occurred. Another trial was done for (7.1) with the same criteria for exiting, but one hundred different starting positions (chosen at random) for the u 's for each random choice of parameters a, t , Y (or 8). No exits occurred in 104 runs.

To test the hypothesis that the 2 x (N = 2) unique fixed point theorem was valid for higher nets than the

3 x (N = 2) and the 4 x (N = 3) cases tested above, it proved difficult to preserve the maximal connected- ness of the (N + 1) x ( N ) case, due to the number of parameters increasing as N x 2 N. We thus considered n × (N-RAM) nets in which n was allowed to be large but N was not, so that, the connectivity was kept low. In particular we investigated nets composed of n N-RAMs randomly connected together for n = 20 and 50, and for N = 1 to 10. The average number of iterations (run length) to obtain convergence to 0.001 on all of ut . . . . , u, was determined on 100 runs with randomised starting values and the same memory content parameter and connectivity. These results are plotted in fig. 8. It is seen that there is little difference between n = 20 and n = 50, and that convergence is very rapid (less than five iterations) for N ~ 7 .

1). Gorse and J.G. T~. lor / A . analysis of noisy RAM and neural nets 109

Fig. 7. The detailed connections o[ four 3-RAMS used in the computer simulation.

20

15

I0

Average i t e ra t ions to convergence

I I I t I

0 2 4 6 8 10 12

N (o rde r of connec t iv i ty )

• n = 2 0

i n=50

Fig. 8. The average run length (averaged over 100 runs) to convergence, for a randomly connected net of n N-RAMS, for various values of n and N. Convergence was tested for [u,(t + 1) - u,(t)j < 0.001 for 1 <_ i _ n.

8. Conclusions

The computer analysis of the previous section for higher nets may be combined with the analytic/computer discussion for the specific 2 x (N = 2) and (N = 1 )+ (N = 2) nets in section 7 to lead us to conjecture that:

The dynamical evolution of a general noisy net is to a unique, stable, fixed point.

We expect the above conjecture to be valid for noisy nets composed of equivalent N-RAMs with differing values of N. This behaviour appears to be a general property of mappings of the type of (3.9), or their generalisation to the case of an inhomogeneous net. Thus the criterion 1 of section 5 (that there be no strange attractors) seems to be satisfied for general values of the parameters. Our detailed analysis in section 6 supports the view that there are in these eases no strange attractors for an), values of the parameters. This indicates the difference between noisy nets and those described by other evolution equations, for which chaos, and the appem nee of strange attractors, occurs [17].

The unique fixed point conjecture also sl~ :ws the difference between the behaviour of a noisy net and of a Hopfield net [4], for whicii there are many stable fixed points. Indeed our results indicate that criterion 2 of section 5 can only be satisfied by the trivial result: all regions of parameter space lead to a single,

110 D. Gorse and J.G. Tay lor~An analysis o f nois.v R A M and neural nets

globally attracting, limit point. This is identical to the behaviour of the noisy neural nets analysed by Markov chain methods in [3]. The conclusion of the latter reference was pessimistic as to the possible use of such nets as pattern recognisers, etc. We do not feel at all despondent on this since if, as we strongly feel, living neural nets are reasonably well described by equations of the class (3.9), then there must indeed be some way in which an "intelligent machine" can utilise this fixed point behaviour.

One feature of the fixed points which also results from our analysis of the previous section is the observation that:

For general noisy nets of n N.RAMs (or equivalent neuronal units), the convergence to the unique stable f ixed point is very rapid, occurring within three or so iterations for n >> N 2 10.

A general justification of this result is that the r.h.s, of (3.9) involves products of factors, each of which is less than one, so ensuring rapid convergence to the fixed point under iteration.

We note that the specification of the degree of convergence is to one part in a thousand; this may be deemed excessive, and a lesser convergence criterion say, to one part in a hundred, would correspondingly speed up the process. The speed of convergence allows the evaluation of order of over sixty different fixed points in the canonical second [18], given a cot~servative time unit equal to 5 ms. Such a rapid response is also relevant to following an input, regarded as producing a time variation of some of the net parameters (as briefly discussed in section 3). If the input varies only slowly in the time 41- then the resulting fixed point will be a well-defined function of the input. Thus the outputs can be regarded as a non-linear operation on the inputs. We hope to analyse the nature of the operation in more detail elsewhere.

Robustness under change is an important feature of the relevance of our results to either a better understanding of living neural nets or to the construction of intelligent nets of noisy RAMs. We have already considered various sorts of changes to the parameters of the net, and also the change of the degree of homogeneity; our results are robust under such changes. The most important feature of the evolution whose robustness we will have to consider with care is that of the synchrony of the updating. This is assumption (2) of section 4 on the noisy neural net equations, and was a similar basic assumption in the development of the noisy RAM-net equations in section 3. It is by now well understood that cyclic behaviour in nets of binary decision elements (or other sorts of units) can be considerably modified if asynchronous updating is chosen to determine the dynamical evolution [19]. This might be expected to be especially so here, where the state space is continuous (as compared to a finite discrete system), so that true chaotic behaviour can arise, instead of "random wandering".

We expect however that asynchronous updating will not change the unique fixed point or rapid convergence theorems. With respect to the rate of convergence, we note that as N increases the degree of the r.h.s, of (3.9) increases, so that the number of factors in each product increases, hence increasing the rate of convergence.

We may take some account of asynchronous updating by including (a) the time t~ ~j~ that is taken for a signal to travel from the j th to the ith unit (the suffix a denoting

"axonar'),

(b) the time t~ ~) taken by the rth unit to respond to its input (r denoting "response time"). Then eq. (3.9) is replaced by

N

u/(t + t~-" ) : E p : j) n [C'J)u( t - t~a/")l%[c(J)u(t - t~aJ'')] a' k = 1 Y .i "

(8.1)

D. Gorse and J.G. Taylor~An analysis of noisy RAM and neural nets 111

where u ( t - t a ¢j')) is the vector with components {uk(t-t¢aY'k))}, 1 _<k < N. We note that, following the discussion of eq. (3.9), we must now specify N[max~t~ + max~,st[s ] independent probabilities at initial times in order to ensure that the r.h.s, of (8.1) contains independent probabilities at each subsequent time. We may generalise the r.h.s, of (8.1) to include a probability distribution over the axonal travel ti~aes, so bringing it close to the form of the polynomial equations for the noisy neural net in continuous time of [10]. The form of (8.1), even with integration over axonal travel times, may be used for iteration as in the discrete time case.

In the particularly simple ease of the 2 x (N = 2) net of section 7(b), and with t~ l) = 2t~ 2) - 2~; t~ ~s) = O, eqs. (7.9) become

ut(t + 2~ ) - ( a 0 + a,u, + a2u 2 + a3u,u2)(t ),

u2(t + 2~') - { boo + b,_) + u,[b,(1 + be) + bob3] + b~u 2 + b,b3u ~ + 2b,b3u,u z + b~u~u 2 } ( t ) .

(8.2a)

(8.2b)

This mapping also satisfies the unique fixed point conjecture, at least as shown by a computer simulation. This uses randomised values of the a ' s and/~ ' s of (6.8), and of the initial values of u I and u 2, with no exiting for 104 trials. Each trial consisted of iteration of (8.2) 50 thales with an exit if lu~(52)- u~(50)l > 0.01. We propose to make a more thorough test for the robustness of our results under asynchrony elsewhere.

There are important questions which have still to be answered before the noisy nets behaving in the manner we have discovered can be regarded as relevant either to artifical intelligence (A.I.) or brain studies. The most important seems to be as to how the single fixed points are to be used to allow "intelligent" information processing to occur (where the fixed point will follow the input, at least for slow variations). Should the noisy net be so built as to be composed of cell Hebbian assemblies [20], say minicolumns [21]? Such a structural arrangement does seem to have some basis in fact in the cortex. A further question is as to the nature of training that may allow adaptive behaviour; the recent work of Cooper et al. [22] appears relevant here. We hope to return elsewhere to the manner in v hich both A.I. and neural net analysis may benefit from the noisy nets we have discussed above.

Acknowledgements

We would like to thank D. Rand for a helpful correspondence and one of t,s (J.G.T) would like to thank Professor Salam of the International Centre for Theoretical Physics, Trieste, for hospitality where part of

the work was carried out.

Appendix

Noisy 1-nets

The properties of networks of noisy 1-RAMs (single-input, four function logic devices) have been considered in some detail by one of us [23]; we will present here only a brief discussion of such networks.

112 D. Gorse and J. G. Taylor/A n analysis of noisy RA M and neural nets

The analysis of 1-RAM nets is greatly facilitated by the fact that in this case the equation for the time evolution of the state probabilities Px is linear, of the form (3.8)

Px(t + 1 ) - TP,,(t), (A.1)

and so the techniques appropriate to a Markov process can be brought to bear on the problem. For an n x 1-RAM net, T is a 2"x 2" matrix, independent of the state probabilities Px, and known as the transition matrix for the system. The structures of the state space (up to a relabelling of the states) are determined completely by the eigenvalues of T, and the work in ref. [23] aimed to discover those eigenvalues (and consequently state structures) for a general n x I-RAM net.

The networks considered were taken to be completely connected (no external inputs); such nets consist of an assembly of rings of lengths ! t, I v . . . , i,, where

r

E I, = n (A.2) i = l

(a 9-RAM example is given in fig. 6(a)). The transition matrix in such a network is given as the direct product

T = T(t, ® ",,,T(2' . @ Z(r',, = 17 Tt~ k'', (A.3) S l t ,. " "

® i

where T, will denote the transition matrix associated with an n x 1-RAM ring structure (fig. 6(b)). Since the eigenvalues of T (and hence the state structure of tbe complete network) can be obtained by taking the Cartesian product of the eigenvalues of the Kronecker factors Tt, it is in general necessary to determine the characteristic equation satisfied by an n-ring transition matrix T,,.

However we will not (for reasons of space) deal with the general case here, but will instead summarize the results appropriate to a subset of the networks, the ones which evolve to a limiting probability distribution. This is justifiable since such nets comprise the vast majority, and in fact this form of behaviour was the only kind observed in our PRAM-net simulations, for generic (randomised) values of the network parameters.

It can be shown (see, for example [24]) that any transition matrix must have as one of its eigenvalues X = 1. The associated eigenvector plays an important role; it can be shown [24] that if a transition matrix T is regular (some power of it is positive) then the limit

L = lim T k (A.4) k ---* o o

exists, and the identical columns of L are given by the (normalized) eigenvector corresponding to h = 1. As discussed above, for noisy 1-RAM nets we would expect I", to be regular in almost all cases. A

non-regular transition matrix would be associated with a nng whose parameters were on the boundary of p~' ~-space (deterministic, or with a strong deterministic component); only in these cases would we expect to observe the cyclic behaviour typical of the networks studied by Kauffman et al. [8].

D. Gorse and J. G. Taylor~An analysis of noisy RAM and neural nets 113

The resalt, then, for a regular n x 1-RAM ring, is that the limiting form of the transition matrix is given by

[ 1- n n

L,= ~ T)- l - ( -1 ) ' r la ,~ .M,, (A.5) k ~ i - 1 l

where

8(0 =Po(0 -Pi0 (A.6)

and M,, is the 2" X 2" matrix whose columns are v,,, the eigenvector of T. corresponding to the eigenvalue X = 1. v. is in turn given by the direct product

,.,o)

g ,= f i O r - 1 p~)r, a)

(A.7a)

with

p~""~ E ( 1 ) - - po"' E = _ , ,p( , ,+t ) 8 ( j ) + + 8(k)

i - - I j - - i + l t = r + l

(- I) "+'- ' + ~ :-,+,I~I 8,,, + p~o°,),

(A.7b)

where ~([.n) is obtained from p(o "''° by the replacement pro k) ---. ~[k) in the r.h.s, of (A.7b). A regular n-ring thus evolves to a fixed point in which the probability of occupation of the state

described by the binary vector a. = (ao, . . . , a2. 1):

lira P~(kt ) = M,,(a.,O), (A.S) k - - * oo

is independent of the initial outputs of the n 1-RAMs, the matrix elements M.(i, j ) being determined by (A.5)-(A.7). An n x 1-RAM network composed of r regular rings, obeying (A.2), will have probability

r

lim P , ( k t ) = l"[Mt,(O, at,) (A.9) k - " co i'ffi l

of evolving to the state a = (atL, al2,... , a t , ) .

References

[1] See, for instance, D.E. Rumeihart, J.L McClelland and the PDP Research Group, Parallel Distributed Processing (MIT Press. Cambridge, MA, 1986).

[2] W.S. McCulloch and W. Pitts, Bull. Math. Biophys. 5 (1943) 115. [3] For a review of earlier work, see, for example, J.W. Clark, Statistical mechanics of neural network, Phys. Rep. C 158(2) (1988~.

and references therein.

114 D. Gorse and J.G. Taylor/A n analysis of noisy RAM and neural nets

[4]

[5]

[6] [6a]

[7] [81

[9] [to] [ti] [121

I13]

[14]

[151 [161 [171

[181 [191 [2o1 [211 [221

J.J. Hopfield, Proc. Natl. Acad. Sci. 79 (1982) 2554; JJ. Hopfield and D.W. Tank, Science 233 (1986) 625. I. Aleksander, A probabilistic logic neuron network for associative learning, IEEE Proc. 1st Ann. Int. Conf. on Neural Networks (June 1987), The logic of connectionist systems, IEEE "Computer" special issue on Neural Networks and Imperial Coll. preprint (1987). See, for example, Hao Bai-fin, Chaos (World Scientific, Singapore, 1984). D. Gorse and J.G. Taylor, On the identity and properties of noisy neural and probabilistie RAM-nets, King's College preprint (January 1988). E.R. Caianello, J. Theor. Biol. 1 (1961) 204. See, for example, S.A. Kauffman, J. Theor. Biol. 22 (1969) 437; I. Aleksander and P. Atlas, Intern. J. Neurosci. 6 (1973) 45; P. Atlas, Ph.D. thesis, Kent Univ. (1978) and references therein. J.G. Taylor, Noisy neural net states and their time evolution, King's College preprint (September 1987). J,G. Taylor, J. Theor. Biol 36 (1972) 513. G.L. Shaw and R. Vasudevan, Math. Biosci. 21 (1974) 207. See, for example, V.I. Amoi'd, V.V. Kozlov and A.I. Neishtadt, Dynamical Systems I-III. Encyclopedia of Mathematical ~iences (Springer, Berlin, 1987). See, for example, the reviews of D. Whitley, Bull. Lond. Math. Soc. 15 (1983) 177; E, Oft, Rev. MOd. Phys. 53 (1981) 655; R.M. May, Native 261 (1976) 459. R.M. May, Science 186 (1974) 645; J.R. Beddington, C.A. Free and J.H. Lawton, Native 255 (1975) 58, and many later papers referred to in [6] and [13] above. M. H~non, Commun. Math. Phys. 50 (1976) 69. J. Cronin, Fixed Points and Topological Degree in Nonlinear Analysis (Am. Math. Soc., Rhode Island, 1964). K,E. Kurten and J.W. Clark, Phys. Lett. A 114 (1986) 413; K.L. Babcock and R.M. Westervelt, Pliysica D 8 (1987) 305. J.A. Feldman, Cognitive Sci. 9 (1985) 1. R.O. Grondin, W. Parod, C.M. Loemer and D.K. Ferry, Biol. Cyb. 49 (1983) 1. G, Palm, Biol. Cyl:. 39 (1981) 181. L. lngber, Phys. Rev. A 28 (1983) 395. C.L. Scofield and L.N. Cooper, Contemp. Phys. 26 (1985) 125, M.F. Bear, L.N. Cooper and F.F. Ebner, Science 237 (1987) 42.

An analysis of noisy RAM and neural nets

Documents

Transcript of An analysis of noisy RAM and neural nets