Post on 04-Feb-2023
Imagination, Perception and Artificial Intelligence
by Kevin Karn (1988)
Master's Thesis, New York University, Dept. of Computer Science
Thesis advisor: Martin Davis Thesis reviewed by: Ernest Davis
1
Contents
Preface ................................ ................................ ................................ ...... 2 1. Methodology ................................ ................................ ............................. 3 2. Linearity ................................ ................................ ................................ . 7 3. Hume and Kant ................................ ................................ ....................... 14 4. What the Apes Lack ................................ ................................ .................. 22 5. The Image................................ ................................ .............................. 27 6. Feature Rings................................ ................................ .......................... 43 7. Hypotheses, Formation, Application................................ ................................ 56 8. Conclusion ................................ ................................ ............................. 79 References ................................ ................................ ................................ 84
Preface My thesis is that imagination is a pivotal linchpin of human perception (and thought), and therefore
must play a role in the science of artificial intelligence. Imagination, however, is an extremely complex
phenomenon, and what I have written could easily be extended in a variety of directions. I have not
attempted to take my analysis onto the level of exhaustive detail and computer implementation. This is
because the status of imagery has been, and still is to a great extent, so confused and precarious in
philosophy and the cognitive sciences that, I feel, the topic is best served by a general and yet thorough
demonstration of its reality and importance. I have attempted to develop a sound, empirically-grounded
framework upon which more detailed theories of the imagination can be erected.
I hope the reader, like I, will be pleasantly surprised by the not insignificant coherence of the
material I have compiled here.
2
1. Methodology
Frankenstein, in a word, is the tacit and true final goal of artificial intelligence (AI). We have to
consciously remind ourselves of this embarrassing fact in today's era of inch-by-inch progress in
ever-proliferating subfields, but the founders of AI were acutely aware of it. In fact in the early days,
many of them saw the Advent just around the corner, like Minsky [34] who ominously forecast in 1961
that "... we are on the threshold of an era that will be strongly influenced, and quite possibly dominated,
by problem solving machines." Prognostications of this sort, which seem absurd in the current research
climate, are common in the early AI literature, and have in fact been catalogued and ridiculed by
Dreyfus [9,10].
But let us by-pass the ridicule and instead highlight one of Dreyfus' basic and most constructive
questions: If the success of the AI enterprise seemed so assured and imminent at the beginning, why did
the program break down? Dreyfus, to his credit, gives a detailed answer to this question, and we shall
return to it in the conclusion (P. 80). But now I would like to say a few words on the failure and
methodology of AI.
First, it is patently obvious that to date we have no Frankenstein or even anything remotely
comparable, so measuring along this standard, AI has been and remains a disappointment. This
disappointment can, of course, be tempered if you take the goal of AI to be bigger and better systems
for circumscribed industrial, commercial and military applications. But, as a point of philosophy, we
shall reject such a view in this paper.
So assuming our goal is to design and realize a true, all-purpose mechanical mind (i.e. Frankenstein),
it would seem wise to carefully choose a method. I call the default research paradigm in AI, which was
appropriated from other more successful sciences, Fragmentism. The strategy of this paradigm is to
break a complex phenomenon into well-defined fragments, model the fragments, and then somehow
eventually expand the fragments or put them back together. I believe a detailed and convincing case can
be made that the Fragmentism paradigm is, by its very nature, incapable of leading us to our goal.
I would, however, like to avoid the detail, since my main aim in this thesis is constructive not critical.
3
So here I briefly summarize my three main objections.
1) The mind is not a department store. Close inspection reveals numerous cases where one faculty
infuses another, or two faculties are symbiotic. For instance, perception of a certain visual pattern may
require action (in the form of eye movements) and a form of creativity (i.e. seeing faces in the clouds),
and so it is impossible for a perception unit to analyze the input and pass a description to action and
creativity units. It needs those latter units to make the description in the first place. The chronic failure
and frustration of trying to part out the mind may be telling us something. Maybe you cannot part it out;
maybe you have to grasp it somehow in toto.
2) "Ad hoc" is a pejorative term in the AI literature for machines which lack generality and work
only in a fixed range of situations. This is regarded as inexpedient because human thought has no
similar restrictions, and thus such machines fail as comprehensive models. But the methodology itself
of AI is ad hoc: systems are developed by circumscribing well-defined behaviors and then constructing
the machinery to perform them. So unsurprisingly no one ever attains the desired generality because no
one ever sets out to attain it. Our plan relies overly on fortune if we expect a machine designed for a
limited task to suddenly reveal itself as having total human generality.
3) AI is, almost by definition, a collection of tools, none of which does the intended job. So I
suppose it is inevitable that a common default view in AI is that "... intelligence is a kludge; people have
so many ad hoc approaches to so many different activities that no universal principles can be found."
(Sowa [45], P. 23) Patrick Winston and Michael Brady, in their Foreword for the MIT Press Series in
Artificial Intelligence (see Dyer [11], P. xii) write: "Unfortunately, a definition of intelligence seems
impossible at the moment because intelligence appears to be an amalgam of so many
information-processing and information-representation abilities." Needless to say, the disorganized tool
box nature of the field bears a suspicious similarity to the posited disorganized tool box nature of the
mind; almost as though the phenomenon is being modified to conform to the explanation rather than
vice versa.
There is something else fishy about this. If the goal of AI is to create intelligence, how do we know
whether we are succeeding or even what we are doing if we do not know what intelligence is? Actually
4
there is a widespread, parochial notion of intelligence in AI; it is something that is involved in things
like chess playing, theorem proving, IQ tests and so on, and not involved in things like spitting contests
for example. Researchers erroneously believe that they have endowed a machine with intelligence when
they have programmed it to perform some task which in the public imagination is thought to require
"brains." The problem here is that intelligence resides not in what you do but in how you do it. Playing
chess is not intelligent; playing chess intelligently is intelligent. The central problem of AI is not, as is
commonly assumed, the development of heuristics to prune mindless exponential searching. Rather it is
how to endow a computer with the ability to see into the structure of a problem, so that it need do no
mindless searching at all. Hence, insofar as we are aiming for Frankenstein, we must seek a theoretical
advance, and this in turn requires insight into the nature of intelligence.
Now we have only one conspicuous prototype for intelligence (the human being) so the task is
imitation, and man's experience with art provides an important lesson. Even a cursory examination of
the art of diverse places and times reveals the chronic tendency of man to adopt traditional stylizations
of what he sees, art which reflects a dogmatic rather than supple and open-eyed approach to the world.
The battle to see what is there, instead of what should be there or what we want to be there or what has
been authorized or passed down or codified, is an endless one, and our only weapon against blindness is
to go to the thing and look at it. (Zu den sachen! "To the things" — The rallying cry of the
phenomenologists.) A man cannot force and bludgeon his way from blindness to sight; he may only
assume the submissiveness to the object which marks a sincere observer and entreat the lightning to
strike.
This idea, that one must approach and study an object without pre-supposition before embarking on
its imitation, seems almost a truism, and yet in the existing work on artificial intelligence it is almost
entirely lacking. For to gain a comprehensive, rich and ordered view of the mind would take one into
the very bowels of philosophy and psychology, far afield from the piece-meal projects and
programming exercises which define AI. So one finds researchers breaking out their algorithms,
equations and well-defined domains on Page 1. This reflects a misguided fascination with tools over the
task itself, much like some writers whose ardor for jargon and wordplay overrides the basic task of
5
writing, i.e., to say something. I do not mean by this that formalism is misguided or inapplicable. Any
explanatory theory must be deductive; we must show how the widest possible range of phenomena
logically follow from the smallest possible set of basic principles. But mathematics and computer
programs are bewitching sirens and bottomless pits from which we must rein in our minds. We must
make it a conditioned reflex to maintain the tether to our patron saint Frankenstein and thereby avoid
entanglement in the branches of infinitely extendable knowledge. The correct path is not to spend 15
minutes gaining a fragmentary and brittle image of man and then spend 15 years exploring the formal
structure implied by that image. We must instead exercise patience and develop a lucid, whole image
from which our formal needs can be ascertained.
6
2. Linearity In approaching the mind, we would like, as with any object of study, to uncover a penetrating yet
constrictive idea of its fundamental nature; we require a framework wherein we may elaborate. The
subject of the mind in particular demands such limits, for the mind is an omnivorous, infinitely creative
thing which effortlessly leads the unwary out-of-bounds, diverting investigation away from the creative
mechanism itself and into the bottomless pit of the mechanism's products.
The mind is, in a sense, a window onto the external world, a window so clear that men took
thousands of years to even realize it was there. Primitive men as well as most modern people are so
intimately involved with the world that they feel themselves beyond the window, outside. This is the
"objective consciousness" of Merleau-Ponty [32], a devious consciousness which obscures its own
origins, and serious thinkers have been as susceptible to its allure as anyone else. And rightfully so, for
as Merleau-Ponty notes (quoting Scheler): "Nothing is more difficult than to know precisely what we
see. 'There is in natural intuition a sort of "crypto-mechanism" which we have to break in order to reach
phenomenal being' or again a dialectic whereby perception hides itself from itself." (P. 58) We can
regard the world as having two poles: objectivity and subjectivity. If I swing to the objective pole, I
view myself in the "third-person" so to speak; I see the world as expanding out arbitrarily in the spatial
and temporal dimensions, and I am just an object among objects. If, on the other hand, I swing to the
subjective pole, I stand on the cogito of Descartes' famous axiom: I am not just an object among
objects; I am, in Merleau-Ponty's phrase, "the absolute source," and I wonder how I could ever think
otherwise. Nothing is more obvious than the fact that I am always situated in a limited, egocentric
perspective. So how could it be that I not only transcend these apparent limits, but transcend them with
such paradoxical ease and have the greatest difficulty even seeing them? This difficulty is Scheler's
"crypto-mechanism," and its power is attested to by the fact that almost 2,000 years lie between the
birth of Western philosophy in the figure of Socrates, and the first true grasp of the subjective viewpoint
with Descartes. An underlying ambition of Merleau-Ponty's philosophy is to rectify the oft-distorted
relationship between subjectivity and objectivity. He writes:
"Scientific points of view, according to which my existence is a moment of the world's, are
7
always both naive and at the same time dishonest, because they take for granted, without explicitly mentioning it, the other point of view, namely that of consciousness, through which from the outset a world forms itself round me and begins to exist for me." (P. ix)
In other words, subjectivity is prior to objectivity, both conceptually and ontogenetically. For firstly,
all our knowledge of the world (including science) is derived from and refers back to the situated
perspective of consciousness. And secondly, as Piaget [38] has experimentally demonstrated, a child is
not born with fully developed notions of external objects, space and so on. Rather, an infant begins with
an egocentric, syncretic* perspective from which objective consciousness is constructed step-by-step†.
The reader may feel that here I am trading in philosophical quibbles which have nothing to do with AI,
and that we should steer toward more conventional themes. But such a move has been and would be a
grave mistake. To see this consider two examples:
1) In the area of knowledge representation, the general AI strategy is to encode knowledge into what
are known in the parlance as schemata. For example, a schema for BUS would be an elucidation, made
explicit through brain-storming, of what a bus is. A bus has four wheels, carries people, stops at bus
stops, has a driver, moves at less than the speed of sound, can plunge or wreck and kill its riders, costs
money to ride, needs fuel to run, emits exhaust... etc. This information is coded up into a data structure
which a computer can then access and operate on. The problem is that this in no way tells us how such
knowledge is constructed from, and brought to bear during direct sensual contact with the world. It is a
sort of third-person objectification of a bus—an empty symbol whose meaning ("BUS as it appears to
me in my experience") is not represented for the computer, and thus can serve no practical function in
interfacing with the world. This results from ignoring the conceptual priority of subjectivity, that is,
trying to code up the world starting from how it is in-itself, rather than how it appears to a situated
* Syncretism is a term, introduced by M. Claparède, for a well-known trait of child thought: that happenstance juxtapositions in the child1s experience are mistakenly taken to be objective. † A tremendous amount of follow-up work on this topic has confirmed Piaget's conclusion. Flavell [13] writes: "Virtually everyone now agrees with Piaget that the infant is not born with the object concept and therefore must somehow acquire it. Because it is so counterintuitive that any living creature could lack an object concept, this agreement is a very important scientific achievement." (P. 40)
8
observer ("me").
2) In robotics and other areas where a computer must deal with an environment which changes with
time there is a deep difficulty (called the "frame problem") in maintaining a data base which models the
environment. The problem is that facts in the data base may interact in complex ways, so that a change
in one fact (through, say, a robot's action) has an unpredictable effect on other facts, and the whole data
base has to be recomputed. Moreover, ridiculous numbers of so-called "frame axioms" must be
introduced to allow the computer to deduce that changing one fact has no effect on various other facts.
One tempting way to solve this problem would be to make the database more like the world. For
example we might have a robot represent its playpen as a 3-dimensional matrix, with itself, its limbs
and its toys all indicated therein. We might even adopt some mechanism whereby objects which are
dropped in the model fall to earth just as they do in real life and so on. The ideal then would be for the
robot to have a model of the world which is exactly like the world; and we might rightly call this the
ultimate spawn of the objective viewpoint. So what is wrong with this picture? As Pylyshyn [38] has
pointed out, the problem is that we are assuming what we are trying to explain. As he puts it:
"...if the representation is too similar to the world it represents, it is of no help in apprehending the
world, since it merely moves the problem in one layer..." (P. 40)
In a related connection, he notes that:
"...[mental representations must in some way be similar to real objects for] otherwise thought would be irrelevant to action, and our chances of survival would be negligible. From this one is tempted to say that representations and the objects they represent must have much in common. Beginning with this innocent remark we are irresistibly and imperceptibly drawn towards the fatal error or attributing more and more of the properties of the environment, as described in the physical sciences, to the representation itself." (P. 38)
After Quine, Pylyshyn calls this temptation "objective pull." Succumbing to this pull amounts to
putting the objective before the subjective—i.e. trying to understand the mind in terms which it is the
mind's job to construct. Merleau-Ponty shows how philosophical understanding has been historically
undermined by this mistake, and inasmuch as AI often unconsciously follows in the footsteps of old
philosophy, it too has been and is susceptible.
9
Having produced these examples (which could easily be multiplied) indicating the relevance of the
objective-subjective distinction to the AI enterprise, let us look to neglected subjectivity, and see
whether it has something to teach us.
In conversing with a friend over coffee, I have a strong tendency to feel, when I look at him, that he
is completely present before me, that he is a simultaneously given, unitary thing. If someone were to
ask me "What are you looking at?" I would unhesitatingly reply "Mike." This is the most natural reply
in the world, and reflects the sovereignty of the objective viewpoint in our common sense. But closer
scrutiny reveals that something more complex is going on. In looking at Mike, it is impossible for me to
bring him entirely within my foveal vision at one time, and so my eyes are constantly darting over him.
He makes an expressive gesture, calling my eyes to his hand, and then raises his eyebrows for sarcastic
effect, drawing my gaze back to his face. He spins to look behind him, showing me the back of his head.
His legs and hips are completely hidden from me by the table. So when I say that I see "Mike," I am
being somewhat misleading. It would be more proper to say that I see a sequence of aspects of Mike; a
complex temporal flurry of impressions more than a unitary thing. This is not a new observation. For
example, Hebb [19] stresses that: "The percept of any but the simplest object cannot be regarded as a
static pattern of activity isomorphic with the perceived object but must be a sequentially organized or
temporal pattern." (P. 469)
We began this chapter in search of a pithy characterization of mind, and with this last observation it
falls into our lap. The mind is a sequence of moments in time. That is, the mind is a strictly linear thread,
and, as much as we like to believe the contrary, we cannot have a bulge in the thread—a spatial cavity
wherein we can stop time and put the pieces of the world together into a coherent, bird's-eye view. This
thread is composed of moments, and the reader need only reflect for a moment to realize how
insubstantial and partial a single moment truly is. Beware of the "objective pull" here. I have sometimes
asked people to describe for me their idea of a moment of time. Invariably they conjure up allusions to
the continuum, with moments densely packed, always another between any given two. But this is not at
all what I have in mind. These people are stepping out of themselves and looking at time in the
third-person. What I mean by a moment is not the scientific view, but the personal subjective view for
10
which the scientific notion of a point on the continuum is mere short-hand or sign-language. These
subjective moments are difficult to describe, but perhaps we can say that they, like visual fixations,
seem to have a crisp (albeit fleeting) center, surrounded by a more nebulous and semi-conscious edge.
As noted above, the schema is the structure of choice in AI for representing knowledge and
organizing memory, and a problem with schema is that they are divorced from our concrete, sensual
existence in time. If these schema are to be an accurate model of human memory, then surely we must
specify how they get into the mind via time, and how they are applied by the mind in time. The lack of
any general, accepted specification of this type is a major defect of the schema model. But perhaps our
newfound view of the mind as a traffic or stream of moments in time can help us with this. That is, is it
possible that our memory structure is a reinstatement of our perceptual life? Could it be that perceptual
moments are somehow recorded in temporal sequence in our brains? Surprisingly, there is extremely
strong evidence for such a view. Penfield and Roberts [37] have demonstrated, through electrode
stimulation of patients undergoing brain surgery, that memories are somehow stored in strips like
motion pictures in the brain's temporal lobes. The authors characterize the elicited flashbacks as:
"...a little like the performance of a wire recorder or a strip of cinematographic film on which are registered all those things of which the individual was once aware—the things he selected for his attention in that interval of time. Absent from it are the sensations he ignored, the talk he did not heed." (P. 53)
Penfield and Roberts also note that the flashbacks were:
"...for the most part, quite unimportant moments in the patient's life; standing on a street corner, hearing a mother call her child, taking part in a conversation, listening to a little boy as he played in the yard..." (P. 53)
On this basis they speculate that perhaps very little of a person's life is omitted from the on-going
stream-of-consciousness record.
There is another important dividend of the view that the mind is a sequence of moments in time,
namely a new perspective on concepts. What is a concept? As a first approximation we might say: a
11
general idea as opposed to a particular individual. But under pressure this approximation unravels,
revealing the apparent opposition to be illusory. For consider the concept of, say, "dog." It is often said
that this concept is what allows us to see the variety of individual, particular dogs as falling under a
single type. Also, the reason we need the concept to do this is that different breeds and individual dogs
appear differently and so we cannot do template matching. But is it not true that a single particular dog
"Lassie" can appear to us, in time, in almost an infinite variety of ways? So how do we recognize all
these impressions as falling under a single type, that is as impressions of "Lassie." The point here is that
particulars are universals, just finer grained. Returning to my earlier example of conversing with my
friend Mike, we see that my various eye fixations, and the various facets of Mike which they might
reveal, must somehow be placed or interpreted as coming from Mike. We are all naive Platonists on the
level of objects, positing the existence of a realm of unitary, unchanging, and thus ideal, things as the
reality behind the shadows of our sensual spectacle. The real Mike or Mike in-himself, is such an idea,
and slightly misappropriating a term from Kant I call this thing in-itself, as opposed to its appearance, a
noumenon.
A hinted at side-effect of this view that particulars are universals is that it shows how deeply
interpretation, ambiguity and context are woven into our experience. As noted above, sometimes we
must attribute different impressions to the same noumenon. But there are also many instances in our
journey through time when we are confronted with similar moments. For example, if I reach down and
stroke a dog, and then later a cat enters the room and I reach down and stroke it, these two momentary
impressions may be virtually the same. But nevertheless, I interpret one of these impressions as from a
dog, and another as from a cat, on the basis of context. So it is a feature of our minds that we create
noumena to lie behind and account for the ever-changing fragmentary flux of the moments of time we
confront. And since sometimes these moments may resemble each other, we have to disambiguate or
interpret them in order to place them with their correct noumena.
But perhaps the pivotal question provoked by thought along these lines is this: What faculty of the
mind allows or causes us to create noumena—self-identical, ideal units beyond perception—so that
interpretation of impressions becomes possible? Or to put it another way, what faculty allows the mind
12
3. Hume and Kant* Imagination... that is how both Hume and Kant answer the question concluding the previous chapter.
It is imagination which allows us to rise above the fragmentary and ever-changing march of our
momentary impressions, and thereby construct and inhabit an objective world of external noumena.
Further, both these philosophers believe that, in the above capacity, imagination plays an essential role
in ordinary perception itself. This last view is surprising in two ways. First, it places a heavy functional
burden on imagination, a faculty which is often considered to be rather frothy and frivolous. And
second, it entails that higher level thought is constantly being brought to bear even during simple
perception, so thought and perception cannot be rigidly demarcated and studied in isolation. Therefore,
if Hume and Kant are right, there is little hope for the AI strategy of using, say, a self-contained vision
subsystem to create structured descriptions for higher level processing. So in this chapter we will
consider the views of these two philosophers, and try to form a rough idea of how imagination could
function as an ingredient of perception.
Hume [21] has a simple view of imagination which seems to accord well with common sense. He
divides all perceptions of the mind into two classes, impressions and ideas, the difference between the
two being one of "vivacity." He writes:
"Those perceptions, which enter with the most force and violence, we may name impressions; and under this name I comprehend all our sensations, passions and emotions, as they make their first appearance in the soul. By ideas I mean the faint images of these used in thinking and reasoning..." (P. 1)
Hume, on the whole, regards an image (his "idea") as a copy (like a photograph or audio recording)
of a previous stimulus, or in Hebb's phrase, "as a static pattern of activity isomorphic to the perceived
object." This view is alluring, but in the end flawed, and since we shall be referring to it in later analysis,
I propose to call it the "Humean Image."
Hume divides the human mind into three basic faculties—sense, reason and imagination—and his
famous argument asks: To which of these faculties can we ascribe the belief in external * I am indebted to Strawson [47] and Warnock [51] for the discovery of this material.
14
mind-independent objects? His first step is to distinguish between belief in the continued existence
versus belief in the distinct or independent existence of external objects. Clearly these two forms are
equivalent:
"For if the objects of our senses continue to exist, even when they are not perceiv'd, their existence is of course independent of and distinct from the perception; and vice versa, if their existence be independent of the perception and distinct from it, they must continue to exist, even tho' they be not perceiv'd." (P. 188)
It is not really evident that this is an important distinction, but it is a part of Hume 's argument, so we
point it out.
He dismisses as ridiculous and self-contradictory the notion that the senses could engender the belief
in continued existence. For that would imply that we somehow sensed something while we were not
sensing it. He then proceeds to ask whether the senses allow us to directly perceive things as distinct.
He dismisses this possibility, for:
"When the mind looks farther than what immediately appears to it, its conclusions can never be put to the account of the senses; and it certainly looks farther, when from a single perception it infers a double existence, and supposes the relations of resemblance and causation between them." (P. 189)
We may also recount here, in further support of Hume, Piaget's [38] demonstration that very young
children (under the age of about 8 months) behave as though an object hidden under a blanket has
virtually vanished into thin air. If in fact inexperienced children see an ardently desired object as
distinct, and distinctness is equivalent to continuity, why do they immediately forget the object when it
falls from sight?
Having disposed of sense, Hume next considers reason. Unfortunately his argument is rather
confused, and perhaps marred by a lack of clarity on the precise nature and scope of what he calls
"reason." But we can isolate his main points:
1. The "vulgar" (that is, those without a philosophical cast of mind) attribute colors, sounds and the
15
like to objects themselves. This belief is so strong that "...when the contrary opinion is advanc'd by
modern philosophers, people imagine they can almost refute it from their feeling and experience, and
that their very senses contradict this philosophy." (P. 192) But this belief is false, and therefore cannot
arise from reason.
2. In Hume's view, all reasoning must rely oh the memory of sequential juxtaposition of cause and
effect in experience. Since a noumenon is not available to the senses, being by definition what is
beyond sense, such a juxtaposition can never be observed.
3. The belief in objects is so essential to our survival that nature will not allow it to depend on such a
feeble and error-prone faculty as reason. No matter what sophistical devices we use to convince
ourselves of the unreality of the external world, when we emerge from out study, we jump out of the
way of an oncoming carriage. In short, the belief in external objects is deeper than reason.
So, by the principle of elimination, we are left with imagination.
Hume's account of how imagination functions in building a world of objects is renowned for its
complexity and lingering pockets of implausibility, but parts of it are persuasive.
First of all, Hume describes two features of our experience, constancy and coherence, which
imagination preys on in constructing external objects. Constancy means that certain "pictures" in our
experience, especially of stationary unchanging objects like mountains, houses, trees and so on, tend to
recur in a uniform way. Coherence means that even those things which change tend to change in regular,
predictable ways. For example, a fire which I left burning may have burnt out when I return, thus losing
constancy. But I have seen such things happen before and may even have predicted it, so the experience
retains coherence. Hume also claims that imagination possesses a sort of inertia so that:
"...when set into any train of thinking, [it] is apt to continue, even when its object fails it, and like a galley set in motion by the oars, carries on its course without any new impulse." (P. 198)
So the imagination, when confronted with the suggestive but incomplete coherence of our experience,
moves to complete it. Also, it must be imagination which does this job because the requisite ideal
16
completeness is never encountered in our experience, and thus is "unreal." So it is as though we were
each scientists, poring over reams of data, searching for an underlying and unifying cause for the
numerical perturbations. We make various hypotheses by imagining, striving always for the simplest
and most explanatory. The best hypothesis then becomes a noumenon, the thing behind the data which
is showing us its various faces, and we believe in its reality.
This much is plausible and we shall make use of it, but for Hume it is not enough. Since he defines
belief, like just about everything else, in terms of "force" and "vivacity," he feels that the above
argument in terms of coherence is "... too weak to support alone so vast an edifice, as is that of the
continu'd existence of all bodies..." (P.198-199) So he provides a complicated additional argument,
based on constancy, which maybe précised as follows.
The notion of identity can only arise when a single object has multiple manifestations, and the only
way this can happen is through time. Hume regards the contemplation of the same, constant object over
a duration of time as the prototypical source of the notion of identity.
This notion is diffused through the agency of resemblance. The mind has a tendency to confuse
resembling things with identical things. For example, a mathematical proof involving two functions on
the same set may be difficult to understand because we keep confounding the two. So when we view the
sun again after an interval during which it has been hidden, we believe we are seeing the same thing due
to a dual resemblance. First, the current picture of the sun resembles a previous one, and second, this
experience reminds us of viewing a constant object which has been hidden. So we "feign" a belief in a
continued object, and this belief derives its requisite "vivacity" from impressions, from which it more or
less rubs off. Hume then notes that, since he has accounted for the belief in continued existence, the
belief in distinct existence falls out, and his system is complete.
Let us take stock of what we have learned from Hume.
1. The Humean Image: The Humean Image itself has problems we will discuss later. But, in this
connection, Hume's view implies* that we entertain a Humean image when we recognize two
impressions of the sun as of the same thing. But looking to our subjective experience, we see this is not
* See Warnock [51] P. 135-136 for a justification of this interpretation of Hume.
17
true. When viewing an object after an interval during which it has been hidden, I do not seem to retrieve
a memory image which I keep in the back of my mind and compare or confuse with the present
impression.
2. The Senses: Hume's argument on the senses is sound. We must look internally for the source of
the noumena.
3. Belief (cf. the "crypto-mechanism"): His notion of belief as "vivacity" is unsatisfying, but he is on
the right track when he claims that belief in objects is deep-rooted, deeper than reason. Jaspers [23], in
his text on psychopathology, stresses the emotional roots of schizophrenia, and points out that
rationality and delusional disconnection from reality can exist side by side. As he puts it: "The critical
faculty is not obliterated but put into the service of the delusion. The patient thinks, tests argument and
counterarguments in the same way as if he were well." (P. 97)
4. Explanation: There is a close affinity between positing noumena as an explanation or "theory" for
our disconnected journey of moments in time, and creating scientific concepts like gravity or quarks to
explain empirical data. (cf. Chapter 7)
5: Identity: The concept of identity requires that we see the same thing in different manifestations,
and this in turn requires some notion of objective time. For identity means "different, but the same," and
there is no way to be aware of the difference if we have no concrete idea of a time other than now. And
to have an idea of a time other than now requires imagination (memory) for we are making what is
absent present. So imagination is a prerequisite for any notion of identity, and thus of external objects.*
Let us now examine Kant's perspective on the imagination.
On this topic, Kant (true to form) owes a great deal to Hume, and yet supercedes him. Like Hume, he
claims that sense, on its own, is incapable of providing us with a world of independent objects, and sees
imagination at work in our very perception.
He writes:
"Now, since every appearance contains a manifold, and since different perceptions therefore occur in the mind separately and singly, a combination of them, such as they cannot have in
* This is an analytic argument and follows from the meaning of the terms.
18
sense itself, is demanded. There must therefore exist in us an active faculty for the synthesis of this manifold. To this faculty I give the title, imagination." (Kant [24], P.144)
And this is further reinforced in a note on the above section:
"Psychologists have hitherto failed to realise that the imagination is a necessary ingredient of perception itself. This is due partly to the fact that that faculty has been limited to reproduction, partly to the belief that the senses not only supply impressions but also combine them so as to generate images of objects. For that purpose something more than the mere receptivity of impressions is undoubtedly required, namely, a function for the synthesis of them." (P.144n)
Thus Kant regards sense as passive and unequipped to organize the welter of data which impinges on
it; and since some internal faculty must provide the organization of sense impressions which obviously
occurs (i.e. we are unitary minds living in a world of objects, not serial bombardments of dumb
impressions), he singles out imagination. This is already an advance over Hume. For Hume believed, in
rough caricature, that imagination creates for us a world of objects through a sort of deceit or confusion.
When we see the sun again in the morning, this kicks up a static Humean image of the sun we saw
yesterday, and the resemblance tricks us into believing we are seeing a self-same individual. Kant
rejects this. He sees imagination rather as an active, organizing power working internally within the
perception.
This reflects another of his innovations. Whereas Hume takes the imagination to be essentially
passive (a sort of mental photography), Kant divides the imagination into two types: a passive form,
which he calls reproductive or empirical, and an active form, which he calls productive or
transcendental. Like Hume, Kant has a three-fold division of the psyche (into sense, imagination and
understanding), and imagination, in its two forms, acts to mediate between the extremes of sense and
understanding. Imagination lends to sense the synthesizing action of the understanding, without which
sense would be chaotic and mindless, and to understanding the concrete material of sense, without
which understanding would be empty of meaning and divorced from reality. This bivalent view of the
imagination is highly original in the Western philosophical tradition, anticipated only perhaps by
19
Plotinus and Avicenna (see Casey [5] P.131-132).
The mediation of sense and understanding by imagination has two faces. On the one hand the
passive imagination is responsible for working impressions into images, a process which Kant calls
"apprehension." But this process is not simply a matter of photography:
"But it is clear that even this apprehension of the manifold would not by itself produce an image and a connection of the impressions, were it not that there exists a subjective ground which leads the mind to reinstate a preceding perception alongside a subsequent perception to which it has passed, and so to form a whole series of perceptions." (P.144)
Thus Kant anticipates our experimentally confirmed guess that impressions are recorded in
sequential strips.
The active imagination, on the other hand, is responsible for subsuming sequences of impressions
under concepts of the understanding, i.e. performing a sort of "pattern recognition." This is achieved
through what Kant calls a "schema." One might feel that we have a conflict of terminology here with
the AI schema, but in fact the two words largely refer to the same thing, and the latter use may derive
from the former.* Kant defines the schema, in contrast to the image:
"If five points be set alongside one another, thus, ..... , I have an image of the number five. But if, on the other hand, I think only of a number in general, whether it be five or a hundred, this thought is rather the representation of a method whereby a multiplicity, for instance a thousand, may be represented in an image in conformity with a certain concept, than the image itself. For with such a number as a thousand the image can hardly be surveyed and compared with the concept. This representation of a universal procedure of imagination in providing an image for a concept, I entitle the schema of this concept." (P.182)
The schema, then, is a rule or procedure by which we can, if we so desire, produce images
(originally fashioned by the passive imagination) subsumed under a certain concept. Thus the schema is
a sort of generalized image, and marks an advance over its Humean counterpart. Moreover, it is by
* The principle difference is that the AI schema is a static collection of declarative facts with no connection to imagination, while Kant's schema is the specification of a procedure which can be use to produce images. In model-theoretic terms, both schemas are theories, but Kant's schema has the additional capacity to produce appropriate models.
20
means of the schema that the active imagination organizes our sense impressions. Unfortunately Kant,
like AI, is silent on how precisely this occurs.
But we can clarify this somewhat by considering a line of thought due to Strawson [47]. Suppose
that on the street you meet a person who strikes you as familiar, but whom you cannot quite place. Then
suddenly something clicks and you recognize who it is. This is a peculiar experience if you scrutinize it
closely; it is almost as though you can see the old face in the new. Strawson struggles for words to
describe this, saying the past perception is "alive in" or "infuses" the present perception. He stresses
moreover that this is not a matter of calling up a Humean image which is compared with the present
perception. For we can very well, recognize a face as familiar without any specific memory of the
situation where we encountered it before. The memory is woven into the fiber of the present perception
as Kant suggested.
This concludes my survey of Hume and Kant. My aim here was to support a claim that imagination
is an integral part of ordinary perception, particularly insofar as it allows us to transcend subjectivity
and awaken to objectivity. As we have seen, two eminent figures of Western philosophy also
endeavored in this same vein. From them we have derived a certain disjointed picture of the
imagination (which we shall refine further below) and two good arguments for its essentiality to
perception: first, the analytic proof of its necessity for having ideas of noumena, and second, the
observation by Strawson that images from the past can "infuse" or "come alive in" present perception.
Two cardinal features of imagination are its capacity for making images, and its function in making
what is absent present. Perhaps this latter is what makes imagination such a tempting candidate for that
which gives us the idea that objects can exist beyond our perception of them. As Nietzsche says: "The
dead man lives on, because he appears to the living man in dreams."* It is this ability to make the absent
present that we shall address in the next section, albeit from a different angle.
* Human, All Too Human, 5.
21
4. What the Apes Lack Regardless of how clever the higher apes (chimpanzees, gorillas and orangutans) may be, there is a
yawning gulf between them and man which is amply summed up in the word "culture." One
conspicuous feature of culture is its explosiveness (on evolutionary time scales) and its infinite
extendibility, and so one wonders what allowed our distant ancestors to transcend the level of apes. Was
it a quick trick, a sort of "quantum leap"? Or was it a long-term construction of a complex apparatus?
As noted earlier, AI tends to regard man as a tool-box or "kludge" with no general organizing
principle, and thus would appear to favor the latter explanation. For surely it must have taken
evolutionary time to develop the myriad requisite information processing and representation techniques.
I, on the other hand, am inclined toward the former view, i.e. that the explosion of culture was triggered
by a subtle and simple trick.
Two arguments support my view. The first is that it would have taken too long to evolve a complex
tool-box of skills.
Recent techniques of DNA comparison place the split of the human and ape lineages in the range of
2.6 to 8 million years ago (Hasegawa et. al.[17], Sibley and Ahlquist [44]). If we assume,
conservatively, that the root lineage did not possess cultural abilities beyond those of apes, and that the
beginnings of culture are marked by the advent of simple stone tools in the lower Pleistocene (about 2
million years ago; see Buettner-Janusch [4]), then we are left with a window of 0.6 to 6 million years
for the development of neural structures and techniques to support culture. This is a paltry figure on the
evolutionary time scale. Evolutionary rates are difficult to gauge and vary for complex reasons, but it
remains implausible that any spectacularly complicated mechanism (like an intelligent Rube Goldberg
kludge) could evolve in 0.6 to 6 million years.
Second, we agree with Buettner-Janusch that "Culture is based upon an ability, a trait, which
appeared during the course of primate evolution, the ability to symbol."(P. 347) Perhaps I am mistaken
but it seems that this ability is not at all complicated, at least in the sense of having many inter-related
parts and mechanisms. Rather it seems, indeed, like a "quantum leap," or a simple and almost magical
new perspective or form of vision (in the broader sense). It seems like a general, all-purpose light which
22
can in principle be focused onto anything; I do not see how you could have it only part way, or build it
up piece by piece. (That is, could you have a creature which could refer to some things with symbols
and not others?)
So assuming a trick was involved here, what was it? Research by the comparative psychologist
Lorenz [27] furnishes a provocative clue. His work on jackdaws in the 1920s was the first to
demonstrate the existence of cultural traditions among animals. He found that jackdaws, reared in
isolation from their wild and experienced fellows, had not the slightest fear of man, dogs, cats and other
predators. He further determined that the jackdaw has an innate reflex stimulated by the sight of any
animal (even another jackdaw) carrying a flexible, black object (presumably a "dead jackdaw"). When
confronted with such a sight, the adult jackdaw emits "a penetrating rasping, rattling sound" which
spreads through the flock. Once a certain animal has been classed as dangerous through this response,
the flexible black object is no longer necessary; thereafter the reflex is triggered by the animal alone. So
when a young jackdaw sees a cat in the presence of older jackdaws, the older birds are aroused and
communicate their fear to the younger bird through the contagion of affect. Lorenz concluded that the
jackdaws raised in captivity had never been indoctrinated into the "tradition," and thus had never
acquired a fear of predators. Lorenz raises a telling point about such animal traditions:
"There is one vital respect in which these examples of animal tradition differ from human tradition: they are all dependent on the presence of the object with which the tradition is associated. An experienced jackdaw can only tell an inexperienced jackdaw that cats are dangerous when a cat is actually there to demonstrate the fact, and a rat can only teach its inexperienced fellows that a particular bait is poisonous when the bait is actually present. This seems to be true of all animal tradition, from the simplest transmission of conditioned responses to the most complex learning by imitation. This dependence on the presence of objects is probably the obstacle which prevents animal tradition from accumulating in the way it does in man. A specific tradition, such as that of the jackdaws' knowledge of cats, is broken once the object on which it depends fails to appear in the course of one particular generation, and the fact that all animal traditions are thus comparatively short-lived may well prevent their joining up with each other and creating a fund of common knowledge. It is only the development of abstract thought, together with the complementary development of verbal language, that enables tradition to become free of objects; for by means of independent symbols, facts and relationships can be established without the concrete presence of the objects themselves" (P.160-161)
23
If we, rightfully, define imagination as the ability to make what is absent present, then imagination is
precisely what the jackdaw lacks. And Lorenz pinpoints this lack as the barrier hindering the
development of culture. Kohler [25] came to similar conclusions in his classic study of chimpanzee
intelligence. He found that if a chimpanzee is placed in a cage, with a banana beyond arm's reach
outside the bars, and a stick long enough to draw in the banana at hand, the chimpanzee can grasp the
situation, take the stick and retrieve its prize. But this ability is governed by the following proviso:
"... if the experimenter takes care that the stick is not visible to the animal when gazing directly at the objective—and that, vice versa, a direct look at the stick excludes the whole region of the objective from the field of vision — then, generally speaking, recourse to the instrument is either prevented or at least greatly retarded, even when it has already been frequently used." (P. 37)
This is further reinforced by Kohler's observations of chimpanzee emotions. The chimpanzee's
emotional life is remarkably rich and similar to our own; for instance, they feel shame, plead for
forgiveness, and take out scoldings on weaker comrades. But they show no traces of emotions, such as
grief, which require consciousness of absent objects. Kohler once observed the collapse of an ill
chimpanzee in sight of his comrades. Immediately one of the group ran to help, crying in sympathy. But
once the sick chimpanzee had been taken back to his cage (where he died), the others forgot him and
showed no grief (P. 285-286). These results of Lorenz and Kohler make imagination a prime candidate
for the trick or "missing link" we began this section in search of. But before this hypothesis wins our
full support, it must answer two objections. First there is the case of apes who have been taught
languages using manual signs or shaped blocks. This research has conclusively shown that apes are
capable of some sort of communication about their environment. But their ability to communicate about
spatially or temporally displaced objects is highly retarded. A comprehensive survey of ape language
studies (Ristau and Robbins [41]) indicates that, despite considerable interest in the topic of
displacement, apes have been only rarely and painstakingly encouraged to refer to absent objects, and
then only in the most primitive way. Second there is the fact that apes have a prodigious recognition
memory. For example, Goodall [15] recounts the story of Washoe, a chimpanzee who recognized his
former trainer after a separation of 11 years. Also Kohler found that chimpanzees could see fruit buried,
24
い
A.
and then immediately find and dig it up the next morning, 16 hours after they had last seen it.
Such observations do not imply that the chimpanzee entertains images while an object is absent.
Even human beings perform similar feats without evoking imagery. For example, I may recognize an
old classmate after many years without having a single thought of him in the interim, or I may put a
letter in my pocket and forget it entirely until I get to the post office. Furthermore if the chimpanzee has
long-lasting imagery, why does it forget its beloved missing companion, or the stick behind it when it is
desperate for a banana?
This solves part of the problem, but we must dredge further. Recall that I, following Hume and Kant,
have claimed that imagination is an essential ingredient of ordinary perception. How then could a
chimpanzee (or even a dog for that matter) seemingly recognize an individual object given its lack of
imagination?
We can answer this question by distinguishing between three different forms of imagination:
Latent imagination: Part of the imagination's role is to create imprints or copies of experience.
Almost all animals have this kind of imagination in the form of memory. For example, even a jackdaw
must maintain a representation of a previously encountered noxious stimulus in order to recognize and
avoid it. But these passive representations are part of the classification hardware or the creature's
nervous system and cannot be liberated. They are more akin to conditioned responses and do not direct
of manifest themselves in perception. They are inaccessible to consciousness and serve only to classify
what is present, not to revive what is absent.*
Imported imagination: This form not only records, but also revives the recording so that it can
direct and appear in the present perception. It is distinguished from latent imagination by the fact that
images of absent things can now be discerned within a present thing. Perception is no longer a matter of
putting a stimulus into a black box and getting its type out the back. The content of the box is now
accessible and can play a role in directing how the stimulus is apprehended. In latent imagination each
* As we have seen earlier, child psychologists agree that the infant has no concept of independent objects. And yet even a new born quickly develops the ability to recognize its mother's face. Piaget [38] resolves this problem by appealing to latent imagination. That is, the infant recognizes pictures not independent objects.
25
stimulus has a single classification, but in imported imagination the same stimulus can be regarded in
different ways.*
Free-state imagination: We might call this form imagination proper, for it is what allows us to
conjure up images of absent objects even when we are perceiving something unrelated or not perceiving
anything at all. Among its species are dreaming and day-dreaming. Now we are in a position to deal
with our dilemma. Clearly the chimpanzee, and by extension other animals, has an extreme deficiency
of free-state images while awake (although they may dream). They also show a limited degree of
imported imagination and this, I contend, is what makes them intelligent. For example, in the stick
problem, the present perception of the situation must make the stick appear in the light of the
chimpanzees previous experience with sticks.
Finally, we see that the question whether imagination is a necessary ingredient of perception hinges
on what we mean by perception. If perception is a black box classifying stimuli, we only need latent
imagination, which is to say no real imagination at all. But our human perception is much richer than
this. We are not stimulus-response computers which, when given a bar code, recognize it in a passive
mechanical way and output the correct product specifications. We do more than react; we posit the
existence of an ideal world of noumena beyond the bar codes and actively see one bar code in many
different ways.
Now I have gone, and will go, to further lengths to implicate imagination in such processes, so at the
least we may say: imagination is a fundamental factor underlying the richness of human perception. To
this we may add the principal conclusion of this chapter: Imagination, in making the absent present,
appears to be the "missing link" in the intellectual transition from ape to man. So we may reasonably
conjecture that imagination is a linchpin and proceed to a finer analysis of its structure.
* These remarks on imported imagination may seem somewhat cryptic at this point. The phenomenon will be addressed in more detail in the next chapter.
26
5. The Image Thus far I have labored to establish a variant of the romantic view of the imagination, i.e. that
imagination is a central and essential feature of human perception and cognition. But obviously this
approach will only benefit the science of AI if we can formalize it. So in this and the following two
chapters I shall attempt to draw a much sharper picture of both the image and its dynamics.
Naively, the best approach to the image would seem to be examining it introspectively in the free
state. But that is a slippery road indeed, fraught with accidental self-deceptions and objective pull, so
we shall instead follow in the steps of Wittgenstein [52] by taking imported imagination as our point of
departure.
Recall that in imported imagination an image of an absent thing invades or directs or changes the
appearance of a present perception. What on earth does that mean? Let us consider an example.
Everyone is familiar with figures like this:
The "Necker Cube"
This cube has the peculiar property of being bi-stable; it has two mutually exclusive, consistent
interpretations. (In fact it is worse than that. Not only can we regard one of the central corners as either
hidden or occluding, we can also regard the figure as planar, with no 3-dimensionality whatsoever.)
Wittgenstein was deeply impressed by this phenomenon and devoted a long passage of the
Philosophical Investigations to its analysis. He calls this ability to fluctuate between alternate views
27
"seeing-as"—that is, you can see the Necker cube "as this" or "as that." What seems to endlessly
fascinate him is that the same, congruent figure can look entirely different. He writes: "So we interpret
it, and see it as we interpret it." (P. 193) He calls the variant interpretations of the figure "aspects," and
the sudden, startling shift from one view to another the "dawning of an aspect." He stresses the
apparent dual nature of seeing in such a dawning: "I see that it has not changed and yet I see it
differently." (P. 193) What are we to make of this paradoxical change with no change? First of all, it is
absolutely certain that what is changing is not the stimulation pattern on our retinal cells. So the
change must not lie in what we are looking at but in how we are looking at it. Wittgenstein
characterizes this "how" in the following terms:
"I suddenly see the solution of a picture-puzzle. Before, there were branches there; now there is a human shape. My visual impression has changed and now I recognize that it has not only shape and colour but also a quite particular 'organization'." (P. 196)
As Strawson [47] has noted, this harks back to Kant's view of imagination as an organizing power
working within perception. Can we get a sharper idea of this 'organization' Wittgenstein is referring to?
I have heard the tale of a polar explorer who spent hours sketching a distant mountain with two long
tapering ice floes, until the mountain moved and he realized it was a nearby walrus. What changed with
the realization was undoubtedly the way the pieces of the picture fit together and related. For instance,
two ice floes on a mountain have no particular intrinsic connection, no matter how symmetrical they
are; they are just meaningless forms which do not fit into any higher complex of inter-relationships. But
when the seed of the walrus realization begins to sprout, it lacks and needs these two white streaks and
seizes them to complete itself, injecting them with meaning and connection within an organized
complex.
At times I also experience this 'organization' when I wake up. When I was a child and my parents
moved me while I was sleeping, or even today if I go to sleep on my bed the wrong way, I may awaken
with a peculiar, almost dizzy disorientation. The sights around me violate my expectations, and there is
a brief flurry of helter-skelter confusion until I catch sight of a "landmark," so to speak, and the world
28
spins around to accommodate, everything settling in its proper place.*
Wittgenstein further indicates how our perception can lack this 'organization': "After all, how
completely ragged what we see can appear!" (P. 200) This recalls an observation of Kohler [25] on the
chimpanzee. He found that the chimpanzee has great difficulty conceptualizing a visual scene which is
obvious for human beings. For example, if the ape must unwind a rope coiled neatly on a pole to
achieve its prize, it will haphazardly jerk and thrash with the end as though it were dealing with a
hopeless tangle. Or if the ape must bring a ladder through the cage bars, it does well when the ladder is
almost aligned and the correct movement is visually evident. But when the ladder is askew, the
chimpanzee seems to look at the criss-crossing pattern of bars and rungs as hopelessly
incomprehensible, and begins to angrily thrash. This is not an unknown occurrence even among human
beings; Kohler compares it to his own experience with folding chairs.
Another example in this same vein is provided by the results of Chase and Simon [6] on chess
memory. They conducted experiments wherein both masters and novices were briefly presented with a
board position and then asked to replicate it from memory. It was found that masters were vastly more
proficient when the position was derived from actual chess play, but masters and novices were on even
ground when given a random arrangement of pieces. Chase and Simon suggest that the superior
performance of the masters can be attributed to their having a large stock of stereotypical chess patterns
from which to quickly construct an economical representation of the position. So perhaps we can say
that, to the novice, chess positions in the middle game look as "ragged" as random arrangements of
pieces look to the master. The master, on the other hand, sees more than just a happenstance
arrangement when given a significant position. The pieces cohere and fall together into larger
meaningful complexes, just like the explorer's ice floes which mutate into walrus tusks.
Examples of this type are common in mathematics and science as well. For instance, we have the
case of Godel seeing that the unique prime number decomposition of an integer could be used to encode
a string of symbols into a single number. Surely this momentous insight and its implications were not * Minsky [35] relates a similar example: "Suppose you were to leave a room, close the door, turn to reopen it, and find an entirely different room. You would be shocked. The sense of change would be almost as startling as if the world suddenly changed before your eyes." (P. 221)
29
written all over the face of the prime number decomposition. That is, where others had looked and seen
only something "ragged," Godel saw 'organization.' On a more mundane plane, suppose you have a
right triangle, with acute angles A and B, whose edges are labeled with their lengths. Consider how
your whole manner of regarding the triangle changes when you switch from calculating the sine of A to
calculating the sine of B. And what about the realization that projectiles trace out a parabolic path? I am
inclined to think that this step required the confluence of two streams of human endeavor: the perfection
of long-range artillery and the study of conic sections. The ancients developed the latter, but for some
reason, perhaps the primitivity of their siege engines or the aristocratic distance of science from military
affairs, everyone at the time apparently held a "ragged" view of hurtling rocks. It is hard indeed to
imagine a man, who has spent long hours sketching and playing with parabolas, viewing the rise and
fall of a projectile and not being struck by an aspect—that feeling "Wait a minute... I've seen that
somewhere before." And is it not a common expression in scientific circles: "To solve the problem, you
have to look at it like this." OR "Once you see it as a dynamic programming problem, the rest is trivial."
At the risk of beating this sadly neglected topic to death, I would like to point out that this
'organization' is not limited to perception; it equally asserts itself in action. Children are the most
conspicuous examples. As Wittgenstein writes:
"Here is a game played by children: they say that a chest, for example, is a house; and thereupon it is interpreted as a house in every detail. A piece of fancy is worked into it." (P. 206)
I myself have seen my daughter put a non-existent "grandma" on a toy horse, smash a picture of a
snake and use a crayon as a microphone. Adults are by no means immune to this behavior. A friend,
while telling me the story of a basketball game, may whirl and "shoot" to illustrate the dramatic final
play. Or he may mimic the voice of his mother or punch the wall as though it were a person he wants to
hit. Even the military, that great bastion of morbid seriousness, conducts "war-games" and uses sticks as
make-believe rifles. The reader has undoubtedly noticed, in the above examples a strong connection
with our ordinary notions of imagination (particularly what I call "imported imagination). The polar
30
explorer "imagined" he saw a mountain; Gödel’s work was "imaginative"; children are said to have
vivid "imaginations." This connection was not lost on Wittgenstein, who writes:
"The concept of an aspect is akin to the concept of an image. In other words: the concept 'I am now seeing it as...' is akin to 'I am now having this image'." (P. 213)
One of his primary reasons for thinking so is that "Seeing an aspect and imagining are subject to the
will." (P. 213) But he also notes:
"The colour of the visual impression corresponds to the colour of the object (this blotting paper looks pink to me, and is pink)—the shape of [the] visual impression to the shape of the object (it looks rectangular to me, and is rectangular)—but what I perceive in the dawning of an aspect is not a property of the object, but an internal relation between it and other objects." (P. 212)
This "internal relation of an object with other objects" is a common feature of all the examples I have
given. And as Strawson [47] has pointed out, this must require imagination since the "other objects" are
not present.
But to be fair we must grant that Wittgenstein hesitated to view all seeing as "seeing-as." He claims
that "...I cannot try to see a conventional picture of a lion as a lion, any more than an F as that letter.
(Though I may well try to see it as a gallows for example.)" (P. 206) But this cannot be right. For
consider the Japanese symbol "十". In Japanese this is read "juu" and means "ten," but it is also the
symbol for "plus" and bears a suspicious similarity to some versions of the small letter "T". When
reading Japanese, or a mathematical expression, it certainly would take an effort to see this mark as a
"T". In fact "conventional" is the operative word in Wittgenstein's claim. The reason I cannot try to see
a conventional lion as a lion is not that it is psychologically impossible; rather it is because this way of
looking is conventional, i.e., established as a standard by cultural convention. There is no basis for
thinking that a lion really is a lion, any more than there is a basis for thinking that "十" really is "juu."
Even if we take a 100% conventional picture of a standing male lion posed on a white background, I
can formally distort it. I might for instance view it as a bizarre creature with a long, tail-like neck, tuft
31
of hair head and a huge grotesque, but useless tail with a face on it to frighten predators. Is this a lion?
And can I not fluctuate between the conventional view (seeing the picture "as a lion") and this perverse
view just as in the Necker cube?
The indisputable fact is that anything can be regarded in myriad ways. This is the trademark that
imported imagination bestows on human perception. All seeing is "seeing-as" or interpretation, even
though some interpretations are more conventional that others. Thus all human vision involves
imagination. We fail to notice this because, for the most part, the world and our images run in tight
lock-step, and often when they do not the world is what gives. Now naive reflection would seem to
confirm that images are Humean, i.e. iconic photographs. But Kant and our look at imported
imagination suggest that the image has a peculiar sort of dynamic 'organization.' Wittgenstein points out
the conflict with the Humean image:
"If you put the 'organization' of a visual impression on a level with colour and shapes, you are proceeding from the idea of the visual impression as an inner object. Of course this makes this object into a chimera; a queerly shifting construction. For the similarity to a picture is now impaired." (P. 196)
This is not just an artifact of imported imagination not shared by free-state images. For Pylyshyn
[39,40] has compiled a wealth of evidence against the Humean image in any form. His basic point
amounts to this: Mental images are not raw and reperceived; they are already interpreted. It is tempting
to believe that an image is like a picture, so that, when someone asks me what color my mail box is, I
recall the picture, look at it with my "mind's eye" and reply "silver."
The problem is not that we are mistaken when we believe we do this. Rather the belief suggests that
the image exists independently of our interpretation of it—i.e. that I can keep probing into the image
and learning new things from it just as I can with an actual photograph.
But if the image is like a photograph, says Pylyshyn, then why when it degrades, do we lose discrete
conceptual units and relations? For example, in trying to recall an old photograph of my first grade class,
I might remember some people and have forgotten others. Among the people I do remember, I might
have forgotten where they were standing although I know they were there and remember what their
32
faces looked like. It seems that images never fade, or lose resolution or get their corners torn off like
actual photographs.
Furthermore, Pylyshyn cites a number of experiments indicating that memory images are tightly
bound to how they are encoded. For example, in the chess experiments of Chase and Simon (discussed
above), the masters are not superior to novices in their ability to "photograph" the board, as the results
with random positions demonstrate. Rather the masters have stereotypical concepts into which pieces
can fit, and when they recall a position they are recalling the concepts more than the raw "picture" they
saw. This same principle holds when we listen to someone speak. In general, we do not remember the
exact words of what was said, only the gist. Or consider the following experiment. Mark off a 3x3
matrix like a tic-tac-toe board, and fill in the cells with a random arrangement of digits. Memorize the
numbers and try to see the matrix in your mind's eye. Can you read the diagonals? The rows backwards
and from the bottom? This task nicely illustrates the difference between the two approaches to imagery
(the Humean; and Pylyshyn's approach, which I call the Kantian), as shown in the figure below:
Pylyshyn's evidence (only briefly summarized here), similar arguments by Casey [5], Hebb [19] and
33
Sartre [42], our earlier remarks on 'organization' in imported imagination, a wealth of anecdotal
evidence, and properly reflective common sense all indicate that the Humean view is incorrect and we
should opt for the Kantian.
We also note that the Humean image does not make much sense in the framework I have developed.
For we have seen that a primary function of imagination is to create noumena— ideal, self-identical
units beyond perception—so that we may interpret the flux of our sense impressions. This function
would be incapacitated if images were raw and required interpretation. We would then be trying to
interpret a welter of sense data by means of a welter of internal imagery, which in turn would require a
"mind’s eye" and "mind's eye's imagery" etc. etc.
Furthermore, Pylyshyn's view (that images are already interpreted and do not require reperception)
sits well with another frequently noticed property of images—namely, that we cannot be wrong about
them. For example, it is virtually impossible to conjure up an image of a house, and then realize, on
closer scrutiny, that it is not a house at all; it is actually a cardboard box. The image, in this sense, does
not carry with it any hidden surprises.* It is an outgrowth of the intention which brought it into
existence. As Sartre [42] writes:
"My perception can deceive me, but not my image. Our attitude towards the object of the image could be called "quasi-observation." Our attitude is, indeed, one of observation, but it is an observation which teaches nothing. If I produce an image of a page of a book, I am assuming the attitude of a reader, I look at the printed pages. But I am not reading. And, actually, I am not even looking, since I already know what is written there." (P. 13)
This property is also evident when, for instance, a child deems a scribble to be "Mommy." The child
is not wrong because it is her prerogative to say what the picture is; it is what she meant it to be. If we
assume that this is a factual property of imagery, then it would make little sense to say that the image,
like a picture, is raw and requires interpretation. For then we could very well be deceived by our
images; I might form an image of 5 apples, and then realize a moment later that there were actually six.
Such images would obviously completely undermine certainty in, among other things, mathematics.
* We shall see momentarily that there is another sense in which an image can hide things.
34
Up to this point, we have determined that the image has two basic properties: it has 'organization,'
and it is pre-interpreted. I would like now to develop a third, and final property, namely that the image
has a temporal structure. Recall that in Chapter 2 on "Linearity" (above, P. 8), we suggested that
memory is a reinstatement of our moment-by-moment perceptual experience in time, and produced
physiological evidence supporting this view. Hebb [19] has suggested that the same applies to the
image:
"If the reader will form an image of some familiar object such as a car or a rowboat he will find that its different parts are not clear all at once but successively, as if his gaze in looking at an actual car shifted from fender to trunk to windshield to rear door to windshield, and so on. This freedom in seeing any part at will may make one feel that all is simultaneously given: that the figure of speech of an image, a picture "before the mind's eye," in the old phrase, does not misrepresent the actual situation." (P. 469)
That is, the image is comprised of a sequence of partial moments which must be journeyed through. To
demonstrate that this partiality of successive imagery moments is not an artifact of his psychological
theory, Hebb cites Binet's [3] reports of imagery in his theoretically naive 14-year-old daughter:
"Asked to consider the laundress, she reported seeing only the lady's head; if she saw anything else it was very imperfect and did not include the laundress's clothing or what she was doing. For a crystalline lens, she saw not the lens but the eye of her pet dog, with little of the head or the rest of the animal; and for a handle-bar, all the front part of her bicycle but missing the seat and the rear wheel." (Binet [3], P. 126, cited in Hebb [19] P. 475)
If the image is a cluster of partial moments, what is the "glue" that holds it together? Hebb [18,19]
proposes a view that we shall examine in detail in the next chapter: that the partial moments are
integrated through correlates of eye (or more generally head and body) movements.
So let us recap the three main features of the image we have derived in this chapter:
1) The image has a peculiar dynamic 'organization' which is particularly evident in imported
imagination.
2) The image is pre-interpreted and does not require reperception.
3) The image is a temporal structure of partial moments, perhaps integrated by correlates of eye (or
35
head and body) movements).
Let us adopt the mild assumptions that: (A) the 'organization' of an image is some sort of system of
inter-relationships, and (B) anything pre-interpreted must be some kind of unambiguous, canonical
representation. Then properties 1) and 2) above imply that the image is quite like an AI schema. For
example, the image for WALRUS might be represented in a LISP schema as follows:
(WALRUS ?W (PART-OF ?B ?W) (PART-OF ?T1 ?W) (PART-OF ?T2 ?W) (INSTANCE-OF ?B BODY) (COLOR-OF ?B DARK) (INSTANCE-OF ?T1 TUSK) (INSTANCE-OF ?T2 TUSK) (COLOR-OF ?T1 WHITE) (COLOR-OF ?T2 WHITE))
All this says is that a walrus is a thing with three parts: a dark body, and two white tusks. The
variables ?W, ?B, ?T1 and ?T2 are open to be bound to an actual instance of a walrus, and its three
components (body, tusk1 and tusk2). This schema has two nice properties which make it akin to an
image. First, it inter-relates various pieces into a complex whole, thus explaining how the image can
lack something, and reach out to seize and interpret it, as happened with the "two white streaks" in the
polar explorer example. Second, its various component symbols—WALRUS, BODY/ TUSK,
PART-OF etc.—are completely unambiguous, pre-interpreted and hold no hidden surprises.
However, as we have noted earlier, a key problem with schemata is that they are non-temporal, and
thus skirt our third criterion for imagery. How schemata are built from moment-by-moment subjective
time is generally ignored; the computer has no need to build schemata because that job is relegated to
the programmer (see, for instance. Dyer [11]). How schemata are brought to bear during
moment-by-moment subjective time is often sidelined as well, because schemata are most widely used
in text processing and data base situations where the computer has essentially no contact with the world
we live in (i.e. as in Dyer [11]), and vision and robotics work tend to focus on the pre-conceptual level
36
(see Marr [31]). So temporality is one point of discontinuity between images and schema.
Still this does not strike to the heart of the matter. If our analysis is correct then images, regardless of
their temporal nature, seem to lack any of the "pictorial" character which would distinguish them from
concepts. One might feel like Pylyshyn [40], that "...the representation is so obviously selective and
conceptual in nature [that] referring to it as an image—a term that has pictorial or projective
connotations is very misleading." (P. 24) But this is an unsatisfactory viewpoint because it fails to
explain why we feel that, in thinking, images are pictorial and often more expeditious than concepts.
For example, consider the following story problem:
1. B is 1 mile due west of A
2. C is 1 mile due north of B
3. D is 1 mile due east of C
Q. How is A related to D?
Problems of this type are often solved by drawing a picture, either mentally, or on paper, or with a
finger in the air. That is, we do not blindly reason or calculate our way to the answer; we create an
image in which the answer is evident and then see it.* Granted the above problem is very simple. But it
cannot be denied that images are exploited to advantage in a variety of more complex problems. So how
do we explain this if images and concepts are the same thing?
What images and concepts share is that they are both schema-like structures of inter-relationships.
The difference between them lies in the nature of these inter-relationships. Whereas in a concept the
relations are so to speak "objective” or "intrinsic," in an image these relations are, as Hebb suggested,
correlates of eye, head or body movements. Thus, for example, a planar image is a data structure whose
main properties are that it is 2-dimensional, and the operation defined on it is free scansion in any
* Waltz [50] makes this same point, noting the combinatorially explosive deductions confronting
actual AI programs which attempt to solve complicated problems of the above type logically, without images.
37
direction.
To clarify this rather elusive point, let us consider another example. Suppose we represent a family
of individuals using a tree, like so:
If we represent this structure conceptually, we obtain something like the following (letting P(x) mean
'parent of x'):
P(A)=nil P(B)=P(C)=P(D)=A P(E)=P(F)=P(G)=B P(H)=C P(I)=H
On the other hand, if we represent the structure imagistically, we obtain (ignoring movement metrics
and simplifying the potential greatly):
LOOK_UP(A)=BLANK LOOK_UP&RIGHT(B)=A LOOK_UP(C)=A LOOK_RIGHT(C)=D LOOK_WAY_DOWN(A)=I LOOK_UP&LEFT(I)=G, etc.
Whereas in an ordinary tree data structure we are constrained to moving within the "objective"
relations (i.e. edges), if we represent the tree as a picture data structure, we achieve much greater
freedom. This is not to say that a picture data structure must contain the results of all possible eye
38
movements from all possible positions. The point is that theoretically it can, and the greater the freedom
of scansion, the closer the representation approximates a picture.
The importance of this idea is that it allows us to salvage the image as an explanatory construct, and
rationalize rather than side-step (as Pylyshyn [39, 40] does) the belief that images are pictorial and
useful in thought. But let me stress further that the image is not inherently pictorial; we are not
regressing to the Humean paradigm. An image is a subjective, concrete species of concept which only
gradually and imperfectly approximates a picture.
So in the earlier example of memorizing a 3x3 matrix, your original idea will be closer to a concept
than a picture, as is revealed by tests of free scansion like reading diagonals. But with practice, you
learn the diagonals and odd scanning paths so that instead of painstakingly working them out each time,
you can in a sense "read them off." The greater your facility in reading off odd scanning patterns, the
closer your concept approaches a picture.
This view also shows how we can learn something new from an image, despite the fact that images
are pre-interpreted. For example, many Americans of a certain era have an auditory image of the
"Pledge of Allegiance." So what is the Pledge's last word? I have asked a number of people that
question, and it generally takes them some time to answer—roughly as long as it takes to quickly recite
the pledge. So do these people know the last word of the pledge before I ask? In one sense they do not
because they must take time to work it out and produce it. In another sense, they do because they can
eventually produce it. But being able to eventually produce an answer is a rather dubious criterion of
knowledge. For example, it might be that standard algebraic facts, which any high school student knows,
can be combined in some twisted, counter-intuitive manner so as to prove Fermat's Last Theorem. If
that is true, then any high school graduate could eventually, given years of mental trauma and a little
luck, produce the desired proof. So, by the "ability to eventually produce" criterion of knowledge,
everyone who knows high school algebra also knows the proof of Fermat's Last Theorem!
The point here is that an image can contain hidden aspects, but these aspects lie in the structural
relations which bind the image together, not in an uninterpreted iconic photograph. These hidden
aspects (like the last word of the pledge) make images useful in two ways (both noted by Waltz [50]).
39
First, we can use them to implicitly and compactly store numerous propositions. For instance, there is
no need for me to store statements like: "The last word of the pledge is 'all'." or "The pledge, read
backward, turned inside out, converted to numbers and squared is such and such." The image allows us
to compute such propositions on an as-needed basis. Second, these hidden aspects are what make
images useful in solving problems like the story problem given above. We can construct an image
according to some specifications, and then process it to find out more about it.
Now I would like also to note that we can extend the planar image if we allow relations involving
bodily movements. For then we can define an analogous 3-dimensional space data structure or "spatial
image." For example, consider an elevator-like robot which moves within a 3-dimensional "shaft"
structure like the following:
At any moment of time the robot can be in any one of the structure's 7 cells A through G. When the
robot is in any cell, it receives a signal on its perceptual side indicating which cell it is in. Also, the
robot has 6 possible actions: IN(I), OUT(O), EAST(E), WEST(W), UP(U) and DOWN(D).
Then this robot's world can be captured in the schematic spatial image shown on the following page.
40
Spatial Image
The appeal of such spatial images is that they allow us to define a world in subjective (egocentric)
rather than objective terms.* That is, spatial images, like their planar analogues, are encodings of how a
situated subject's perception and action are knit together by a world, rather than how the world itself is
knit together.
This subjective character of images makes them an attractive means of mediating between sense and
concepts, just as Kant suggested. The benefit of developing this notion of mediation is two-fold. First,
as we have seen, the AI schema is a pure concept—objective and divorced from a situated subject's
perception and action, and thus of little help in interfacing with the world. But images have sensory and
motor aspects, as well as conceptual aspects, and thus may usher concepts into contact with the world.
Second, there is substantial evidence (Piaget [38], Vygotsky [49], Flavell [13]) that images are the
* We also note that Tolman [48] experimentally implicates spatial images (what he calls "cognitive maps") as the means a rat employs to represent a maze, and thus perhaps its environment.
41
cradle of pure concepts; that is, knowledge in the child is sensuous, concrete and image-based.
Sensorimotor images serve originally as surrogates for adult concepts and only gradually acquire
objectivity. So, in sum, the image shows promise of remedying the two major defects of AI schema:
how they are applied during and built from experience.
42
6. Feature Rings In the theoretical analysis of the previous chapter, we suggested, following Hebb [18,19], that the
image is a structure held together by correlates of eye, head and body movements. This claim was not
adequately addressed, so in this chapter we shall review some corroborative results from the field of eye
movement research.
In psychology, the topic of eye movements has had a long and controversial history, and the field has
generated a surprisingly ample literature—only a fraction of which shall be touched on in this chapter.
The principal focus here shall be to describe and analyze an eye movement based theory of memory and
visual recognition originally proposed by Hebb [18] and more recently explored by Hochberg [20],
Noton and Stark [36], Farley [12] and many others.
I. Feature rings
As noted, the theory to be examined here has had a number of incarnations in the work of different
authors, each of whom has developed a different terminology. For the sake of uniformity, this chapter
shall adopt the terminology of Noton and Stark [36].
It is a well known phenomenon that the fixations of a subject viewing a picture tend to occur at
points of high information content such as corners, rapidly changing contours and incongruous objects
(Mackworth and Morandi [30], Loftus [26]). Recordings of the viewing path of a picture therefore look
clustered, with certain focal points attracting the majority of fixations. In studying such recordings.
Noton and Stark [36] observed, like many previous investigators, that the transitions between various
focal points seemed somewhat regular and even appeared to form cycles. For example, in viewing a
portrait, fixations might intermittently return to a certain "beaten track" such as left eye, nose, mouth,
left eye. Noton and Stark called these regular movements "scan paths" and found evidence (albeit
contested by some for statistical reasons) of their existence in a more systematic survey of eye
movement data.
These observations on scan paths led Noton and Stark to propose that these paths play a functional
role in recognition. They claimed that the memory or mental model of an object consists of a directed
43
simple cycle (as in graph theory) whose nodes are the contents of fixations and whose arcs are labeled
with eye movements. In short, this structure encodes the particular eye movement necessary to get from
one fixation content to another in the original viewed picture. They called such a graph a "feature ring"
—the cyclic structure and term "ring" suggested by the observed cycles in scan paths. (For similar
structures, Hebb [18] uses the term "phase sequence," Hochberg [20] the term "schematic map," and
Farley [12] the term "image.") Noton and Stark further hypothesized that when a subject views
something the first time, a feature ring is laid down and this later directs eye movement during
recognition. To test this theory, he conducted an experiment with 2 phases. In the learning phase
subjects were shown 5 pictures they had never seen, each for 20 seconds. In the recognition phase, these
pictures were shuffled with 5 unseen pictures, and each picture in this randomly ordered 10 picture
sequence was presented to the subjects for 5 seconds. The subjects' goal during the recognition phase
was to class the pictures as seen or unseen. Eye movements were recorded during all viewings.
The result of this experiment was: 65% of the time, the scan path in the recognition phase largely
reiterated that of the learning phase. Noton and Stark point out that: "That is a rather strong result in
view of the many possible paths around each picture..." (P. 40) To account for the 35% of the viewings
where a scan path did not occur. Noton and Stark relaxed their feature ring model to something closer to
a general directed graph, and suggested that the scanning process dictated by the feature ring is probable
rather than deterministic.
II. Problems with feature rings
The feature ring theory is controversial, and a variety of objections have been made in the literature.
For instance, Groner et. al. [16] claim that there is no evidence suggesting that visual features are
stored with eye movement components. But that is an oversight; for dreams, which surely are built from
scraps of memory, are accompanied by eye movements. Dement and Kleitman [8] have shown
experimentally that these REM (Rapid Eye Movements which accompany dreaming) are correlated
with dream content. For instance, a subject awakened after one minute of almost purely horizontal eye
movements reported dreaming of "two people throwing tomatoes at each other," and another subject
44
awakened after a similar period of vertical eye movements "dreamed of standing at the bottom of a tall
cliff operating some sort of hoist and looking up at climbers at various levels and down at the hoist
machinery." (P. 344) It would be stretching things to claim that the dream is "projected" somewhere and
the dreamer must move his eyes to see different parts of it. For if it is "projected" anywhere, it is
projected inside the brain*, and clearly moving the eyes will not affect vision of that picture one iota. So
the pictures and eye movements of the dream must be correlated by some memory mechanism like a
feature ring.
Groner et. al. further claim that feature rings would be an inefficient form of storage. But as we have
seen earlier, Penfield and Roberts [37] have shown that memories—even very insignificant
memories—are stored in motion-picture-like strips in the brain. Unfortunately, Penfield and Roberts do
not comment on the eye movements of the patients during these episodes, but I believe my basic point
has been made. That is, we have at least one good reason for considering this prima facie inefficient
encoding.
Another difficulty has been the question of the statistical reliability of the results on scan paths, i.e.,
it is difficult to discriminate between a "bona-fide" scan path and a random artifact (Groner et. al.[16],
Stark and Ellis [46]). This is a knotty question, but I believe that the above evidence of dreams and the
considerable theoretical interest of the feature ring theory (which hopefully is apparent in light of the
previous chapters), demand that we go beyond this quibble. We must not call off the race before the
horses are even out of the chute. That is to say, we may profit by pursuing the deeper questions of the
mechanics and workability of the theory, and to this we now turn.
1. The Interpretation Problem
In the previous chapter, we marshaled a number of psychological and philosophical arguments
indicating that an image is not raw and uninterpreted. Since the appeal of feature rings lies in their
kinship to the notion of imagery we have developed, we are confronted with the problem that nodes of * It could not be projected in the eye itself (equally futilely) because the optical pathway is composed solely of afferent fibers.
45
feature rings (in the Noton and Stark model) are uninterpreted, iconic imprints of fixation contents. This
could be remedied, however, by making feature ring nodes interpretations of fixation contents.
Such a move is rendered plausible by the old and elegant eye movement based theory of visual
ambiguity (Gale and Findlay [14], Hochberg [20], Stark and Ellis [46]). To see how this theory works
consider Wittgenstein's "duck-rabbit":
The Duck-Rabbit
This figure, like the Necker cube, has two aspects—a duck and a rabbit—and we saw, in discussing
Wittgenstein, that when we alternate between the two aspects, what appears to vary is the 'organization'
of the picture elements. The eye movement theory of ambiguous figures states that this shift in
'organization' is intrinsically linked with how you scan a figure—where you look, and where you avoid
looking. One can get a sense of why this theory is so old and persistent just by viewing the duck-rabbit
for a while, alternating between the two aspects, and observing how the duck and rabbit appear to be
associated with different ways of scanning the figure. For instance, viewing the right side (the rabbit's
nose) seems more tightly bound to the rabbit interpretation, and the left side (the duck's bill) to the duck
interpretation. To account for these facts. Gale and Findlay [14] have proposed that each possible
fixation content has multiple interpretations. For example, the protrusion on the right side of the
duck-rabbit can be viewed as either "rabbit ears" or "duck bill." These interpretations are related
together into complex wholes (which constitute higher level interpretations) by eye movements, and the
46
picture element interpretations have varying probabilities of evoking complexes in which they occur.
Shifting between aspects is hypothesized to occur when the current fixation interpretation is more
tightly bound to a complex other than that currently aroused. This theory has been, to a reasonable
extent, confirmed (Gale and Findlay [14], Stark and Ellis [46]). And it offers two advantages over
exotic, anti-structural explanations like bi-stable neural nets (Marr [31], P. 25). First, it has some
grounding in empirical findings, and second it provides a probable and more refined idea of how images
can "come alive in" or "infuse" current perception—a commonplace and factual occurrence, as we have
seen.
So we shall adopt this revision of the feature ring theory which transforms feature rings into what we
have earlier called planar images. (We shall, however retain the term feature ring for the duration of this
chapter.)
2. The Parallel vs. Serial Problem
Another objection to the feature ring theory is this: If recognition requires eye movement, how do we
account for the fact that (some!) images can be recognized when presented with a tachistoscope so
quickly (on the order of a few hundred milliseconds) that eye movement is impossible? Similarly, what
about images that are so small that they can fit into the 1-2° visual angle of the fovea and thus require
no eye movement to be seen clearly and recognized?
Two basic responses have been given here but before analyzing them, I would like to briefly combat
the extreme and unsatisfactory conclusion that the feature ring theory is 100% erroneous. The main
difficulty with assuming that all recognition is parallel, instantaneous and astructural is that it is not. For
example, if I am involved in a game where I must determine whether a given car is my car, and the car
is superficially identical to mine and moreover I will be electrocuted if I misjudge, then clearly my
recognition must and will involve something more structural than a glance.* And this is not an artificial,
pathological case. The world is (and was especially in less pampered times) filled with life and death
situations dependent on a recognition. This man I am drinking with appears to be a friend—he wears
* This "car game" is analyzed in more detail on P. 59 below.
47
our uniform and speaks our language—but perhaps he is my assassin. Shall I judge by a glance?
Having noted this problem with the pure parallel view, let us now consider the two ways parallel and
serial recognition have been reconciled.
a) Hebb: Hebb [18] was deeply influenced by the work of Senden [43] concerning congenitally
blind persons who were given sight during adulthood through removal of their cataracts. It was found
that these persons have counter-intuitive difficulty learning the skills of visual recognition. He cites one
striking example given by Miner [33]: "Miner's patient, described as exceptionally intelligent despite
her congenital cataract, two years after operation had learned to recognize only 4 or 5 faces and in daily
conferences with 2 persons for a month did not learn to recognize them by vision." (Hebb [18], P.105)
Hebb stresses that at-a-glance recognition of the type exhibited in tachistoscope experiments is only
developed after a long, arduous learning process which in the normal person occurs in infancy. Hebb's
essential idea of this process is as follows. When a congenitally blind person is given sight in adulthood,
he or she has tremendous difficulty recognizing something as simple as a triangle, and must resort to
techniques like counting corners. The slightest change in the set-up—different lighting or a different
backdrop—completely disrupts a previously skillful recognition. But with practice, the counting of
corners and so on becomes more and more rapid and smooth until the person is capable of recognizing
the triangle "at-a-glance."* Hebb believes this is achieved through the formation of what he calls
"assemblies," and gives the following illustrative example:
"Let us say an infant has already developed assemblies for lines of different slope in his visual field. He is now exposed visually to a triangular object fastened to the side of his crib, so he sees it over and over again from a particular angle. Looking at it excites three primary assemblies corresponding to the three sides. As these are excited together, a secondary assembly gradually develops, whose activity is perception of the object as a whole—but in that orientation only. If now he has a triangular block to play with, and sees it again and again from various angles, he will develop several secondary assemblies, for the perception of the triangle in its different orientations. Finally, taking this to its logical conclusion, when these various secondary assemblies are active together or in close sequence, a tertiary assembly is developed, whose activity is perception of the triangle as a triangle, regardless of its orientation." (Hebb [19], P. 472)
* Compare with the remarks on pp. 41-42 on how concrete concepts gradually take on the characteristics of images with practice.
48
Thus the final result is that the infant develops what we might call a disjunctive normal form (DNF)
detector for triangles. That is, primary and secondary assemblies are template matchers which respond
to static patterns on the retina—edges and triangles respectively. The secondary assemblies, which each
respond to a particular triangle on the retina (with fixed size, shape, position and orientation), are
connected to a large OR gate (the tertiary assembly) which responds when any one of its inputs
responds. Hebb's view has two basic problems. First, if the result of the learning process is a DNF
triangle detector, we would need a separate secondary assembly for each of the immense variety of
triangles which can be inscribed in the retina. Moreover, the detector would not seem to be very useful
in allowing us to differentiate between triangles, or seeing (interpreting) one triangle in various ways
(i.e. when calculating the sine of different angles in the same triangle).
Second, it seems, as Hebb points out, that tachistoscope recognition is error prone and limited.
Hence even if his hypothesis is correct, it must have an upper limit which he does not acknowledge (cf.
the car game, above and on P. 59 below). We cannot in general expect people to classify objects of high
complexity with any precision in a tachistoscope presentation. They might be able to tell you the object
was a polygon, but it is doubtful whether they could tell you it had exactly 27 sides, etc.
So we shall reject Hebb's theory of "assemblies," and provide an information-theoretic mechanism
which I will call the "Trademark Heuristic." Recall that Gale and Findlay [14] propose that each
possible fixation content has interpretations, and these local interpretations are integrated, via eye
movements, into larger structures which in turn constitute higher level interpretations. Now what
happens if, through experience, a certain fixation content is monopolized by a certain higher level
interpretation?
Then we have a local feature which serves as a unique marker for a higher level entity. For instance,
a sequined glove on one hand is a sort of trademark for Michael Jackson, and will belong to his feature
ring and few others. So if we by chance fixate on that glove in a tachistoscope presentation, it will give
us tremendous information—in the sense that it drastically cuts down our uncertainty about which
feature ring applies. And this in turn allows us to recognize Michael Jackson at-a-glance.
49
This account is consonant with the results of Gale and Findlay [14]. They found that, when a large
ambiguous figure was painted onto a subject's retina as an after-image using a flashgun (to eliminate the
effects of eye movements), the aspect seen depended on which part of the image was painted onto the
fovea. This suggests, as the Trademark Heuristic predicts, that the "gist" obtained from brief exposure
to a picture depends on the part of the picture fixated.
There are, however, two facts which escape the Trademark Heuristic:
i) Intraub [22] has demonstrated that encoding within a single fixation need not, under certain
circumstances, be an all-or-nothing detection phenomenon. The amount of information extracted from a
single fixation can vary depending on the length of the fixation, and the attention devoted to it (i.e. we
may decide how deeply to encode a certain fixation).
ii) Shifting of ambiguous figures and illusory length distortions still occur when an illusion is
stabilized on a subject's retina (Gale and Findlay [14], Coren [7]).
So let us consider a second approach to the Parallel vs. Serial problem:
b) Noton and Stark: Noton and Stark [36] propose that the usually overt eye movements are carried
out by the practiced adult as internal shifts of attention in the case of a small picture, and describe
experiments which support this view.
In the first type of experiment, subjects are presented with an array of designs, each small enough to
be apprehended in a single fixation, and asked to find a "target" design. Such experiments reveal that
the subject requires more time to recognize the target than to reject non-targets. This suggests that some
sort of sequential checking process is involved; i.e. the subject is not equipped with a parallel "detector"
which he sweeps over the array until it signals the presence of the target. Furthermore, when the
complexity of the target design is varied, target recognition time varies proportionally. This also runs
counter to the view that the subject develops an instantaneous, astructural detector for the target. Noton
and Stark also recount an experiment wherein a subject viewed a small drawing of a cube, and indicated
50
after randomly chosen intervals where he thought he was looking. At the same time, the subject's eye
movements were recorded. It was found that the subject's eye movements deviated very little from the
center of the drawing, whereas the points at which he felt he was looking were widely dispersed over
the figure. The authors regard this as evidence that attention can be and is internally directed onto parts
of a picture small enough to fit into a single fixation. Gale and Findlay [14] also postulate internal
attention mechanisms as allowing the shift between aspects of a retinally stabilized image. They write:
"The ability to alternate the perception of stabilized images is hypothesized here to be a function of the generally small size of the stimuli, such that attention can be moved about the stimulus without the need for eye movements. When large stabilized images have been employed and eye movements recorded then movements have been found despite their futility." (P. 148)
So Gale and Findlay maintain that the process underlying the shifting of ambiguous figures is
selective attention. Eye movements are an expression of this selectivity, but may not be necessary if the
viewed figure is small (and simple) enough. Our position on the Parallel vs. Serial Problem can thus be
summarized as follows. A large number of cases of recognition (like the car game (see P. 59 below) and
large pictures) require a serial process. Tachitoscopically presented images can be roughly classified
using the Trademark Heuristic, and more precisely defined through attentional operation on the
decaying sensory icon (Hochberg [20]). Small images and retinally stabilized images (which are
generally small) are handled through a serial process involving internal shifts of attention. So we shall
contend that single perceptual moments (i.e. attention fixations, with or without eye movements) can be
interpreted in parallel, but all other recognition is either overtly or covertly structural.
To accommodate the Noton and Stark theory, we could modify the feature ring so that its nodes are
interpretations of attention fixations, and its edges are labeled with either eye or attention movements.
This, however, would not affect the main tenets of the theory, so in the rest of this chapter we shall
simplify things by retaining the language of eye movements. The extension to attention fixations and
movements is just a reinstatement of the same process on a new plane, and shall be left implicit.
Now, having digested the Parallel vs. Serial Problem, let us bring up two more problems with feature
51
rings.
3. The Serial Recognition Problem
Noton and Stark's idea of the feature ring based recognition process has the following serious flaw.
Assuming that a person has a collection of feature rings in memory and is presented with an image, the
problem of recognition amounts to finding the feature ring which best fits the data. Clearly the person
cannot apply all the feature rings at one time since they are liable to specify conflicting eye movements.
On the other hand, applying them all in round-robin fashion would be computationally expensive and
contradictory to the known facts of human performance. Moreover, we cannot select the "right" feature
ring right off the bat, because if we could do that we, paradoxically, would not need the feature ring.* In
short, Noton and Stark's hypothesis that the feature ring "directs" the scanning and recognition process
becomes vague and impossible when multiple feature rings come into play.
4. The Termination Problem
The last problem with a feature ring is the fact that it is a general directed graph. That is, we usually
regard a recognition process as having a beginning and then an end where a definite judgment is made,
i .e. "That is a guitar." So do we break up the ring, and then write the recognition algorithm so that
every single link must be checked? That would bring us back to the deterministic feature ring model
which Noton and Stark saw as contradicting experiment.
These last two problems stem from the vagueness of the theory and so we shall attempt to
approximately resolve them below.
III. Clarification
The feature ring theory says that a person's mental model of something (i.e. a car, a tree, a friend
etc.) is a feature ring, i.e., a directed graph whose nodes are interpretations of fixation contents and
whose arcs are labeled with eye movements. So in some sense, we can think of a person's intellect as
* This paradox applies to any recognition scheme using conflicting serial procedures.
52
containing numerous feature rings, all named or labeled according to what they represent. Given a
picture, the task is similar to string matching in that we must find occurrences of the feature rings
(=strings) in the picture (=text). As has been noted above ("The Serial Recognition Problem," P. 55), we
can neither try "fitting" all the feature rings at once nor apply feature rings in round-robin fashion.
So how do we apply feature rings?
#1) Fact: In a single fixation, an observer determines the "gist" of a picture (Bierderman et. al.[2],
Intraub [22], Loftus [26]). This information is sketchy (Hebb [18]) and deficient regarding the
inter-relationships of objects in the scene (Bierderman et. al. [2], Farley [12]).
#2) Consensus: This "gist" aids in determining the destination of the next saccade (Bierderman et.
al.[2], Intraub [22], Loftus [26]). These phenomena can be understood in terms of the Trademark
Heuristic. That is, when we make our initial fixation on a picture, we interpret the fixation content.
There may be many interpretations or few, and these interpretations will only be part of certain feature
ring complexes. We can rule out the feature rings which do not contain the current interpretation, and
thus avoid wasting time trying to "fit" rings which will never fit due to their gross or local
physiognomy.
This gives us a partial answer to the Serial Recognition Problem: We pare down the set of feature
rings to be applied, using the interpretation or "gist" obtained at the initial fixation. But suppose the
original "gist" does not narrow the field down to a single possibility. Then what do we do?
#3) Fact: Fixations occur at "informative" details (Loftus [26], Mackworth and Morandi [30]).
After the initial fixation, we have a set of potentially applicable feature rings, and we desire to pare
down this set as quickly as possible. So we could use the probabilities associated with fixation content
interpretations (or a measure derived from them) to choose as our next saccade destination that which
most increased our certainty about which feature ring to apply—i.e., that yielding the most information
in the information-theoretic sense.*
So far this discussion amounts a fairly satisfactory answer to both the Serial Recognition and * Loftus [26] defines "informative": "An object in a picture is informative to the extent that it has a low conditional probability of being there given the rest of the picture and the subject's past history." (p. 503)
53
Termination Problems. There is no preset algorithm for recognition which runs through branchless
stages. Rather we start off with a field of possible interpretations and whittle down that field as quickly
as we can by conducting the most informative experiment possible at each saccade. When the field has
been narrowed to one possibility, recognition is complete. So the amount of effort and time involved in
a recognition varies depending on the information yielded by each fixation, and thus on the context.
Furthermore, we note that this version of the feature ring theory can account for the statistical doubts
regarding scan paths mentioned above (P. 48) since the feature rings do not enforce an exact, previously
ordered scanning path. The path is, so to speak, determined "on the run."
The recognition process just described illustrates, in germ form, the following ubiquitous and central
principle:
#4) Consensus: Perception is a complex, multi-form, interlaced process of bottom-up and top-down
control (Farley [12], Gale and Findlay [14], Hebb [18], Hochberg [20], Loftus [26], Minsky [35], Waltz
[50]).
This holds in the recognition process because the choice of saccade destination depends on the
obtained "gist" (bottom-up), and the interpretation probabilities and feature ring inter-relationships
(top-down). But from another angle, we can view the recognition process as bottom-up control which
terminates when an unequivocal interpretation has been reached. At that point the feature ring
associated with the interpretation takes over, and control shifts to the top-down mode. And yet even
within this feature ring guided top-down process, we may saccade to the most informative point (within
the feature ring), thus modulating the top-down with some bottom-up. This would check the
characteristic flaw of top-down control—riding roughshod over the facts—and also allow us to reject
our initial hypothesis as quickly as possible if it can in fact be rejected.
IV. Conclusion
In this chapter we have presented evidence (relating to dreams) that at least some memories
inter-relate momentary impressions and eye movements, thus suggesting that the planar image data
structure proposed theoretically in the previous chapter is psychologically realistic. Moreover, we have
54
seen how feature rings (planar images) can be used to explain visual ambiguity, and have thereby
clarified our idea of how images can manifest themselves in or direct perception (i.e. by controlling eye
movements). The feature ring theory also shows us how an image can be created from experience:
through a direct recording mechanism of the type hinted at by Penfield and Roberts [37] (the main
problem being how interpretations develop and differentiate; these issues are discussed in the following
chapter).
We also must point out some inadequacies. First, the feature ring theory deals only with eye (and
attention) movements, and neglects the treatment of head and body movements which any general
account of imagery would require. Second, and more important, the theory implicitly sanctions the view
that all memories are images. This is incorrect, because some memories are pure concepts and have no
eye movement components. Since the theory downplays pure concepts, it fails to tell us how to get from
images to concepts, and thus liberate our thinking from the tyranny of the concrete.
Still, feature rings have pointed out a key aspect of the dynamic behavior of imported images—that
they must come into play within a complex ebb-and-flow of bottom-up and top-down control. This
dynamism is the concern of the next chapter.
55
7. Hypotheses, Formation, Application To begin, let us consider an example (albeit rather far-fetched) which illustrates a major theme of
this chapter. Suppose a man plays the "car game" introduced in the previous chapter; that is, he is
presented with a car, and must determine whether it is his or not. The penalty for wrong judgment is
death. How would the man approach this problem? Assuming the penalty motivated him, he would
probably proceed to the car and inspect it, principally by examining defects which another person
would be unlikely to notice. For example, he might know that his car has a broken weld on the exhaust
pipe which he noticed once when the car was jacked up for an oil change. So he gets down under the
car with a flashlight and looks at the weld, finding that it is indeed broken. So this is a confirmation, but
there is a rusty wire around the pipe which seems unfamiliar. Had the wire been there before and
escaped his notice, or was it a discrepancy? He weighs this discrepancy according to the certainty of his
memory and decides to leave it as a nagging unsettled question. He tries a whole series of
experiments—checking for the creak in the door, the sticky point on the accelerator, the chain under the
seat—and each time is confirmed. So he is leaning toward the opinion that this is his car. But then he
notices the brand name on the tire. He does not remember what the brand name on his tire was, but his
tires were from California and definitely did not have Arabic markings like this one. Moreover, the man
who proposed the game looks vaguely Middle Eastern. A light is going on here, and he rechecks the
ashtray which he only examined cursorily before, thinking the cigarette butts were his. On closer
inspection he finds a cigarette which also bears Arabic markings. His working hypothesis now is that
the car was brought from somewhere in the Middle East and doctored to look and behave like his car.
He waits until the man falls asleep and searches his wallet. He finds suggestive evidence: a duty receipt
placing a car, with a serial number matching that of the car being inspected, in Malta two days earlier
when he was driving his car to South Carolina. This would seem to confirm his new hypothesis, but
nagging questions arise. Could the man have forged the document, and tampered with the serial number
and tires of his own car? Or maybe the serial number is that of his own car; he does not remember it.
The man seemed to fall asleep suspiciously easily. Is the new hypothesis just a carefully choreographed
red herring? And so on. The man proceeds like a detective, noting discrepancies and confirmations,
56
weighing their certitude, framing new hypotheses. Then once he has digested everything to his
satisfaction, he embarks on an intensive, multi-dimensional, conscious weighing process. He leans one
way, then another, until satisfaction builds to the point where it discharges itself in a judgment: "Yes, it
is my car."
This example is contrived, but it derives some merit from the fact that it not only constitutes a case
of structured, serial recognition (i.e. a typical recognition), but also obviously exemplifies the basic
paradigm of scientific reasoning. Let us see how this "Hypothesis Paradigm" might connect to the main
themes of this thesis.
1. Hume: As noted earlier, Hume saw the world as structured by a peculiar, orderly "coherence" which
the imagination exploits to create noumena. For instance, I may leave a fire burning in the hearth, and
find it a pile of smoldering embers when I return. So I am confronted with two disparate appearances,
and yet I view them not as entirely distinct things but as manifestations of the same underlying thing (i.e.
a noumenon). Clearly this unification cannot be attributed to similarity of appearance (Hume's
"constancy"), so Hume attributes it to what he calls "coherence." That is, if I have previously seen a fire
mutate into embers, I can unite the two disparate appearances by means of this connective experience.
Thus for Hume, the ability to predict transformations of an object's appearance is the essential
component of the world's coherence. Our world is coherent because it is a tight fabric woven together
by bonds of predictability, and this, Hume contends, is in part what makes us attribute continued
existence to objects.
We can see how this might work with two examples. First, suppose that we have a coin, hidden in a
small ring box, which we check up on periodically. Then if the coin is there on two successive
inspections (involving looking, weighing, monitoring by a device etc.), we assume it was there between
the inspections. We feel certain that the coin did not blink out of existence while our back was turned,
for if coins were always disappearing like this, then surely someone somewhere would have caught one
doing so, either by accident or stealth. But then again, cannot we imagine an empirically inaccessible
demon or even mindless necessary mechanism which blinks out the coin as soon as the coast is clear
57
and rematerializes it when someone looks? Not really, for the kink in this hypothesis is the relationship
between what is "visible" (thus existing) and what is "invisible" (thus non-existing). For instance,
consider the core of the earth. If at some moment no one is monitoring or even cares about its existence
then by hypothesis it would disappear, causing a massive cataclysmic collapse of the earth's surface.
We might somehow explain the non-occurrence of this collapse by postulating a mechanism which
makes the earth's surface behave as though the core were there, even when it is not. But here we begin
to tempt Occam's razor*, and one asks: Why not let the core itself be this mechanism?
Objects do not exist in a vacuum; they are inter-connected in a web of predictable cause and effect—
i.e. they are coherent. If I enter a room and find the couch frayed, this is evidence that the cat was
existing while I did not perceive her. If I tip an hour-glass and leave it in a room where no one is
perceiving or monitoring it, I return and find it has marked time exactly as it should in my absence. If it
blinked out of existence when I turned my back, what mechanism calculated the proper distribution of
sand for reinstatement? If I am a child holding a block I can still feel it when I close my eyes, thus
showing that things can exist independently of my vision of them. Likewise, I can release the block
while viewing it, demonstrating that it can exist independently of my touch. In short, objects leave
traces of their continued existence and the most parsimonious way to account for these traces is to
complete the partial coherence by positing noumena. Another example supports Hume's line of thought.
Consider things like ghosts which are widely believed to be unreal products of the imagination. It is not
the wispiness of ghosts or the fact that they are supposed to be the souls of the dead which makes them
unreal. It is the fact that they violate the laws of coherence. For if ghosts left traces of themselves, and
there was some rationale behind their appearance so they could be conjured up in a scientific laboratory
in repeatable experiments, then they would undoubtedly be accepted as actual mind-independent
phenomena, regardless of their immateriality. So how does imagination exploit this coherence to create
noumena? It does so by creating images, which, as we have seen are complexes of interpretations held
together by correlates of movement. But more than that, each image is already a hypothesis because it is
* An axiom proposed by Occam during the backlash against medieval scholasticism: "Do not multiple explanatory entities beyond necessity."
58
comprised of experiments which predict how the world will behave. For example, in a feature ring
(planar image), the basic unit of information is: "If I am looking at x and move my eyes in direction y I
will see z." Similarly, in the car game the man exploits the spatial image of his car to classify a
particular experience under a noumenon. For instance, he knows: "If I am sitting in my front seat, and I
bend over and look under the seat, I will see a chain."
So in short we are claiming this: The world is a coherent web of predictable cause and effect
relationships. A child is born into this world and begins to form images—structures recording how the
world has been, and thus predicting how it will be, in terms of the child's own perceptions and actions.
But these structures continually collapse under the weight of their discrepancies and reform on ever
higher, more adaptive planes. Finally to achieve optimal consistency and prediction, these images
(hypotheses) must develop to the point where objects are posited as existing independently of the mind.
Furthermore (in line with our earlier analyses) the ability to conjure up images of absent objects is a key
impetus behind the formation of this objective hypothesis.
2. Piaget: The above arguments suggest that coherence is a key factor in the development of notions of
objects or noumena. But we would like empirical evidence as well, and for this we turn to Piaget [38].
This renowned child psychologist demonstrated (to the lasting satisfaction of the field, as we noted
earlier) that children are not born with and must acquire notions of objects. Let us briefly outline the
early stages of object concept development revealed by Piaget's research.
STAGES 1 and 2 (about 0 to 4 months): The child recognizes what Piaget calls "pictures"—i.e.
momentary impressions. The child learns to track a moving object with its eyes, and after it disappears
will stare at or look back to the point of disappearance as though this action would make the object
appear.
STAGE 3 (about 4 to 8 months): The child can now predict, to some extent, the itinerary of a moving
object which disappears. For instance, if a toy is dropped on the floor, the child will look to the floor for
59
it. However, this behavior is still in some sense a dumb reflex, for if Piaget drops the object behind his
own back, the child will search for it in his lap (P. 17). Furthermore, if the initial movement of the
object is not seen or the look to the floor is unsuccessful, the child will stare at the hand which dropped
it as thought it might any moment appear there. The child is also capable of uncovering an object which
has been partially hidden by a screen (such as a blanket) but completely forgets and makes no attempt to
uncover objects which are fully hidden, even though she has the motor skills to do so. Consider the
following observation from this stage:
"I then offer her the doll which is crying. Jacqueline laughs. I hide it behind the fold in the sheet; she whimpers. I make the doll cry; no search. I offer it to her again and put a handkerchief around it; no reaction. I make the doll cry in the handkerchief; nothing." (P. 40)
STAGE 4 (about 8 to 12 months): The child will now lift up or brush aside a screen to retrieve an
object which has been hidden behind it. The object, however, is still not completely constituted. For
firstly, if an object is hidden under blanket A and the child successfully recovers it a few times, and then
the object is hidden under blanket B (while the child is watching intently), the child will search for it
under A. This reaction can be extreme. For example, Piaget hides a toy parrot in his lap by covering it
with his hand, and the child removes the hand to obtain it a few times. Then he:
"... [places] it in plain view on the edge of a table, 50 centimeters away. At the first attempt Jacqueline raises my hand and obviously searches under it, always watching the parrot on the table." (P. 55)
There are also residual reactions during this period. For instance, if Piaget removes a doll hanging over
the child's hammock and hides it behind his back, she will look behind him. But if this look is
unsuccessful, she will return her gaze to the place where the doll previously hung. (P. 62)
STAGE 5 (about 12 to 18 months): The principal hallmark of this period is that objects are searched
for where they were last seen hidden. That is, the child is unable to account for what Piaget calls
60
"invisible displacements." For example, if a toy is hidden in a box, and the box is covered with a blanket
under which the toy is removed, and then the box is withdrawn from the blanket, the child will search
the box and completely neglect the blanket. The child believes that the object is somehow linked with
the screen into which he last saw it disappear.
STAGE 6 (about 18 to 24 months): The child is now capable of finding the object after invisible
displacements, and has some sense of the object's conservation, i.e. of the fact that it must be
somewhere. Yet the following bizarre reactions from stages 5, 6 and beyond clearly indicate the
lingering discrepancies between the object concepts of adults and children:*
"At 1;3 (9) Lucienne is in the garden with her mother. Then I arrive; she sees me come, smiles at me, therefore obviously recognizes me (I am at a distance of about 1 meter 50). Her mother then asks her: "Where is papa?" Curiously enough, Lucienne immediately turns toward the window of my office where she is accustomed to seeing me and points in that direction. A moment later we repeat the experiment; she has just seen me 1 meter away from her, yet, when her mother pronounces my name, Lucienne again turns toward my office. Here it may be seen that if I do not represent two archetypes to her, at least I give rise to two distinct behavior patterns not synthesized nor exclusive of one another but merely juxtaposed: "papa at his window" and "papa in the garden." At 1;6 (7) Lucienne is with Jacqueline who has just spent a week in bed in a separate room and has gotten up today. Lucienne speaks to her, plays with her, etc., but this does not prevent her, a moment later, from climbing the stairs which lead to Jacqueline's empty bed and laughing before entering the room as she does everyday; therefore she certainly expects to find Jacqueline in bed and looks surprised at her own mistake. At 2;4 (3) Lucienne, hearing a noise in my office, says to me (we are together in the garden): "That is papa up there." Finally, at 3;5 (0) after seeing her godfather off in an automobile, Lucienne comes back into the house and goes straight to the room in which he slept, saying, "I want to see if godfather has left." She enters alone and says to herself, "Yes, he has gone."" (P. 64-65)
Let us take stock of Piaget's results in our context. First we note that each of Piaget's stages is
characterized by a certain hypothesis of the object's location, and an associated motor reaction for
obtaining the object. These hypotheses and reactions are summed up in the following table:
* Note: "1;3 (9)" means "1 year, 3 months, 9 days."
61
Stage Hypothesis (object location) Motor reaction
1, 2 Where it was last seen Position eyes as they were when
object was last seen
3 In the direction it was last seen
moving
Extend with eyes the object's
trajectory
4 Hidden by a particular screen
under which it has been found
before (i.e., blanket A)
Search under the screen with
which object is associated
5 Where it was last seen hidden Search under the screen behind
which the object was last seen
hidden
6 Somewhere in the itinerary of
hiding
Methodically search for where it
could be
Already we can see the kinship with Hume's notion of coherence. For each of the hypotheses in the
sequence makes a prediction about how the world coheres (i.e. how it can be predicted). For example,
the child in stage 1 or 2 feels that the object is associated with a particular positioning of his eyes; he is
almost like a rat in a Skinner box thinking that if he presses the button (looks in the right direction) he
will receive the reward (a picture of his mother). But this notion is clearly erroneous, and the testing,
which the motor reactions amount to, will show discrepancies (just as in the car game). So a new
hypothesis must be fashioned, but that in turn will break down due to failed predictions, and so on. So
what this tells us is that the child proceeds in virtually the same way as the scientist. He is confronted
with a welter of data which he must somehow account for and predict, so he develops a series of
hypotheses, confirmed or rejected through testing, which gradually approximate the objective world.
This vindication of Hume was not lost on Piaget:
62
"...three criteria seem to us to contribute to the definition of the object peculiar to the sciences: in the first place, every objective phenomenon permits anticipation, in contrast to other phenomenon whose advent, fortuitous and contrary to all anticipation, permits the hypothesis of a subjective origin. But, as subjective phenomena also can give rise to anticipation (for example, the "illusions of the senses") and moreover as unexpected events are sometimes those which mark the failure of an erroneous interpretation and thus entail progress in objectivity, a second condition must be added to the first: a phenomenon is the more objective the more it lends itself, not only to anticipation, but also to distinct experiments whose results are in accordance with it. But that is still not enough, for certain subjective qualities may be linked with constant physical characteristics, as qualitative colors with luminous waves. In this case, only a deduction of the totality succeeds in dissociating the subjective from the objective: only that phenomenon constitutes a real object which is connected in an intelligible way with the totality of a spatio-temporal and causal system (for example, luminous waves constitute objects because they have a physical explanation, whereas quality is dissociated from the objective system). These three methods are found to be the very same which the little child uses in his effort to form an objective world." (P. 97-98)
But let us notice more. A striking feature of the progression delineated by Piaget is that the
original notions of objects are given in terms of relations between movements and perceptions—
i.e., they are imagistic in the sense of the term we have developed. Indeed, Piaget calls the first 18
months of life the "sensori-motor period" (see Beard [1]) because the dominant trend is the
development of "sensori-motor schema," i.e., integrations of perception and movement into
complex wholes, i.e., images. Thus Piaget's work suggests, as we claimed earlier, that images are
the cradle of concepts.
3. Sartre: We have previously discussed the phenomenon of imported imagination, wherein, in
Wittgenstein's phrase, "an image comes into contact with a visual impression." How is this
accomplished? For we are up against the classic problem of memory which I described in the
previous chapter (the Serial Recognition Problem, see P. 55). That is, I have many memory
images, and I must select one which applies to the current experience. But I cannot compare the
current experience with all my images in parallel because each image unfolds differently in time.
Moreover, serial comparison would take an unreasonable amount of time. And I cannot magically
select the "right" image because that is precisely what I am trying to explain.
Sartre addresses this problem by describing how his own image was evoked while
63
watching the impressionist Franconay mimic the French singer Maurice Chevalier. He
suggest that the retrieval and application of images is guided by what he calls "signs."
These signs are what we earlier (in Chapter 6 "Feature Rings", P. 44) called trademarks.
That is, they are local image structures which are monopolized by larger image structures in
which they are subsumed. A certain hat, a sequined glove, a mannerism—these are all signs.
Sartre illustrates how these signs can evoke and guide an image:
"The artist appears. She wears a straw hat; she protrudes the lower lip, she bends her head forward. I cease to perceive, I read, that is, I make a significant synthesis. The straw hat is at first a simple sign, just as the cap and the silk kerchief of the real singer are signs that he is about to sing an apache song. That is to say, that at first I do not see the hat of Chevalier through the straw hat, but that the hat of the mimic refers to Chevalier, as the cap refers to the "apache sphere." To decipher the signs is to produce the concept "Chevalier." At the same time I am making the judgment: "she is imitating Chevalier." With this judgment the structure of consciousness is transformed. The theme, now, is Chevalier. By its central intention, the consciousness is imaginative, it is a question of realizing my knowledge in the intuitive material furnished me." (Sartre [42], P.36-37)
First we note that watching an impressionist is clearly a case of visual ambiguity or
"seeing-as." As Sartre puts it: "I am always free to see Maurice Chevalier as an image, or a
small woman who is making faces." (P. 36) Thus we might expect to find affinities with the
feature ring theory of visual ambiguity, and indeed we do. Sartre writes:
"An imitation is already a studied model, a simplified representation. It is into these simplified representations that consciousness wants to slip an imaginative intuition. Let us add that these very bare simplified representations—so bare, so abstract that they can be immediately read as signs—are engulfed in a mass of details which seem to oppose this intuition. How is Maurice Chevalier to be found in these fat painted cheeks, that black hair, that feminine body, those female clothes?" (P. 37)
Sartre answers this last question by noting that the discrepancies are neglected and
treated as a ground upon which the confirmations form a figure:
"That black hair we did not notice as being black; that body we did not perceive to be the body of a woman, we did not see those prominent curves... They have a sensible
64
opaqueness; otherwise they are but a setting." (P. 38)
This suggests, as the feature ring theory of ambiguity predicts, that what we see in
Franconay depends greatly on where we look at Franconay. The aspect of the small woman
is evoked by her hair, clothes and physique; the aspect of Maurice Chevalier is evoked by
the straw hat, the protrusion of the lip and the bend of her head.
We can also perceive affinities here with the car game (i.e., serial recognition, see P.
59): Franconay is the car to be inspected, and Maurice Chevalier is the man's own car. In
both cases, a hypothesis is formed on the basis of signs. The hat is a sign for Chevalier, and
the chain under the seat is a sign for the man's own car. These signs have
information-theoretic significance, as suggested in the previous chapter. That is,
P(Chevalier | hat) and P(my car | chain) are high, while P(another person | hat) and
P(another car | chain) are low, so that the hat and chain are highly-informative features that
are effective in reducing our uncertainty about which image applies. Once we have an
image, we have what amounts to a hypothesis (as we have seen) because the image is a
conglomeration of experiments. We can maintain the "seeing-as" experience by homing in
on those experiments which are confirmed, and avoiding those which reveal discrepancies.
But it may be, sometimes, that the experimentation guided by a certain image reveals
discrepancies which are in turn signs for a new image. In watching Franconay, this occurs
when we fixate on her hair. This is a violation of a prediction of the Chevalier image, but at
the same time it is a positive sign for Franconay and may lead to an evocation of her image
which will then guide the perception. Likewise, in the car game, the Arabic markings on the
tire not only disconfirm one hypothesis, they are a sign for another. And this new
hypothesis immediately reaches out and seizes the "Middle Eastern" man, reinterpreting
him and endowing him with meaning in a higher complex.
I believe I have now shown, to a reasonable extent, the ubiquity of the Hypothesis
Paradigm in human mentation. We have found evidence of it in recognition (the car game),
and have shown in the previous chapter that there is good reason to believe that the vast
65
majority of recognitions are sequential and governed by this paradigm. We have seen how
the paradigm may underly our notions of noumena and objectivity (Hume). Furthermore,
we have established a sound connection between hypotheses and images, and shown that
imagination is a prerequisite for having ideas of noumena (Chapter 3-4). We have seen the
Hypothesis Paradigm in operation in the development of the child's concepts of objects
(Piaget), and in the highest realms of scientific thought. And we have uncovered the
paradigm in imported imagination and visual ambiguity (Sartre), which in turn are closely
linked with recognition.
Even taking all this in the utmost sobriety, one is tempted to conclude that here we have
rooted out a basic, general control mechanism of thought. So let us cast the paradigm in
general form. First we have a set of hypotheses which can be applied to experience. Each of
these hypotheses contains experiments which predict how the world will behave, and these
experiments can reveal either confirmations or discrepancies. But there is more to the
picture than this, because hypotheses are not static; they rise and fall, congeal and
disintegrate. Reflecting what we have stressed about schema, there are two fundamental
questions we can ask about this dynamism:
1) Formation: How is an original, prototypical hypothesis formed or toppled through
moment-by-moment experience in time?
2) Application: How do existing hypotheses get applied to and rejected by current
moment-by-moment experience in time?
I shall call these two questions, respectively, the formation and application problems.
Now my main concern in this thesis is imagination, and the solutions I shall essay for these
two problems shall be couched in those terms. But this does not constitute a claim that
hypotheses and images are the same thing. What I do claim is that images are a form of
hypothesis, and that there may be developmental and functional continuities between
66
images and less concrete conceptual hypotheses. So with that caveat, let us proceed.
Formation: Images are structures of interpretations held together by correlates of
movement, and we have seen that the interpretation-movement-interpretation bond can be
regarded as a prediction or experiment, thus justifying the comparison with hypotheses. So
the formation problem amounts to this: How is an image formed from experience in time?
The Noton and Stark [36] theory of feature-rings furnishes a clue. In that theory, an image
is a directed graph whose nodes are the content of fixations and whose edges are labeled
with eye movements. So there is a clear algorithm for forming such images: Fixate on a part
of a picture, create a node labeled with what you see, extend an edge from that node to a
new node, label the edge with an eye movement, move your eyes according to that eye
movement, fill in the new node with what you see etc. This process is illustrated in the
following diagram:
Noton and Stark Image Formation
This is an easy and clean learning technique with two nice features. First, it is obviously
related to the Penfield and Roberts [37] results on cinematographic recording of memory,
67
and second, since the bonding is subjective, we could potentially use this technique to
explain syncretism in the child. The problem, however, is that the nodes of the feature ring
are iconic copies of fixation content rather than interpretations. So the Noton and Stark
theory cannot account for visual ambiguity, or even the recognition of the same figure of a
different color (such as a negative of the duck-rabbit). To resolve this problem, we
proposed, following Gale and Findlay [14], that the nodes of a feature ring (i.e. image) are
interpretations rather than iconic copies. But this would seem to destroy the utility of the
Noton and Stark formation principle, for we cannot record interpretations. Interpretations
are not a part of the sense data which can be recorded; they must come from an internal
source.
We can begin to extricate ourselves from this problem by considering learning in the
following light. Suppose we take a simple animal, such as a rat, and distinguish between the
"input" impressions it can receive, and the momentary "output" actions it can perform. Also
suppose that we use capital letters to designate the inputs and small letters to designate the
outputs. Then we can use alternating sequences of capital and small letters to describe the
rat's behavior in terms of what it is seeing and doing. (Note that these descriptions are
subjective, i.e. from the rat's point of view.)
We can interpret simple operant conditioning in this format. For example, imagine a rat
in a Skinner box seeing something and moving, seeing something and moving etc. This
input-output chain continues until the rat accidentally hits the button and a pellet of food
drops out. So we infer that the link between the stimulus of seeing the button and the action
of pressing it is strengthened. If we let B = sight of button, p = action of pressing button and
F = food (which is reinforcing), we can say, in our notational framework, that the
occurrence of the sequence BpF should lead the animal to do p when seeing B.
But what does the animal do in the following situation (where E represents electrical
shock)?:
Trial 1:BpF, Trial 2: BpE, Trial 3: BpF, Trial 4: BpE etc.
68
If the animal uses standard stimulus-response conditioning to learn in these
circumstances, it will go into an oscillatory fit, for after Trial 1 (BpF), p is strengthened,
just in time for the animal to be punished on the next trial and vice versa.
This alternation of reward and punishment might be just a sadistic experiment by a crazy
professor, but it can be much more interesting than that. For suppose we change the rat's
box so that there are identical buttons at either end; call them L and R. Both buttons have
food dispensers, but feed is only delivered from R, and then only in the case that L has been
pressed first. So, letting N = no food, we may imagine the rat meandering around the cage
and experiencing the following sequence:
...RpN...LpN...RpF...RpN...RpN...RpN...LpN...RpF...RpN...RpN...LpN...LpN...LpN...
If we realize that L and R look identical from the rat's perspective, we see that the
above alternating sequence has nothing to do with sadism. The environment is trying to say:
"Pushing R is partly right and partly wrong; there's more to it!"
Now the conditioned response is interesting from the standpoint of image formation,
because it is the germ of an image; images are, in a broad sense, networks of
stimulus-movement-stimulus bonds. Furthermore, the conditioned response is a hypothesis
because it makes a prediction about the world which can fail or succeed. The alternating
sequences we have described are a case in point; the response is alternately confirmed and
disconfirmed. And Piaget, among others, has suggested that disconfirmation is the engine
underlying learning.
So let us see how these ideas might be adapted to generalize the Noton and Stark image
formation process. First of all, in that process, fixation-movement-fixation bonds are always
derived from actual experience, so the recorded relationship must have held at least once.
Therefore, we can expect that many failed predictions will be of the on-again-off-again,
alternating variety indicated above. And any such alternating discrepancy will wreak havoc
69
with the Noton and Stark recording process.
For the predictive bonds then become diffuse and probabilistic rather than certain. This
situation cannot be resolved by attaching probability weights to the arcs of the feature ring.
Because then, firstly, deciding whether a result is a discrepancy or confirmation becomes a
slippery, hard to decide matter. If P(AbC)=0.5 and P(AbD)=0.5, and we obtain AbC in
experience, is that a confirmation or a discrepancy? And secondly, simply recording
probabilities will not promote structural understanding. For consider the rat in the box with
two buttons which we spoke of earlier. Then letting B = sight of button (L or R, they look
the same), the rat may compute probabilities like P(BpF)=l/20 and P(BpN)=19/20. These
probabilities obviously give the rat no structural understanding whatsoever; they make the
rat treat the situation like a slot machine, rather than what it really is—a vending machine. •
So how do we handle alternating discrepancies? To my knowledge no one has offered a
solution of this issue, so I shall here make a sketchy, partial and admittedly speculative
attempt. I believe, like Kant, that the human mental process tacitly assumes that the world is
deterministic—i.e. we are extremely inclined to attribute every effect to a cause. Thus,
when we get an alternating sequence, say AbC...AbD...AbC...AbD etc., we believe that
there is something in the context which is causing the non-determinism.* For example, if
half the time I sit on a chair and it squeaks, and the other half of the time it does not, I
attribute the squeak to some cause.
Perhaps it is the way I am sitting, or the humidity, or someone is switching the chairs on
me and so on. My contention is that this same process occurs in the child's formation of
images. The child begins by recording via the Noton and Stark scheme, but as soon as an
alternating discrepancy occurs (say AbC, AbD), the child splits the iconic A into two
interpretations A1 and A2. These interpretations are such that both can be evoked by the
sensory impression of A, and furthermore the child forms two new deterministic links
(A1bC and A2bD) from the old non-deterministic links (AbC and AbD). So the new
* This, of course, is a variant of the Principle of Sufficient Reason.
70
representation says that there are two different A's, one which leads by b to C and one
which leads by b to D. Now when the child obtains, say, XxA, both A1 and A2 are evoked.
So the child can take move b, and if, say, C is obtained, the child interprets A as A1 and
builds a link XxA1.
To take a concrete example, suppose that A = sight of an eye, b = eye movement to the
mouth, C = sight of a moustache and D = sight of no moustache. Then the child may obtain
the alternating sequence AbC, AbD when alternately viewing its mother and father. So A
will differentiate into two interpretations: A1 = dad's eye, and A2 = mom's eye. In this way
the child can maintain the predictive power of his images. For once the father image has
been applied, the child knows that AbC, even though father and mother's eyes may be very
similar.
I readily admit that this idea, which I call the splitting principle, is only an indication
and could undoubtedly benefit from a more rigorous empirical and formal treatment. But it
is an answer (at least a partial one) to the image formation problem which not only accords
with our other analyses, but also is amenable to mathematical methods and computer
implementation.
Application: Recall that the application problem concerns how images are evoked and
dismissed by moment-by-moment experience in time. We have already encountered this
problem to some extent in our discussion of imported imagination and feature rings. For
example, in the car game or Sartre's example of the impressionist, we saw that an image
could "invade" or "come alive in" the present perception. Similarly, the feature ring theory
states that the visual exploration of an object is directed by a feature ring recorded when the
object was previously encountered. What we are asking, then, is how an applicable image is
selected, applied and dismissed when it loses its applicability.
Two paradoxes make this problem difficult. One, which we have mentioned earlier (the
Serial Recognition Problem, 6.II.3, P. 55), is that the applicability of a memory image
71
cannot be determined until the image is applied, so we cannot select an applicable image
prior to the application process. Applicable images must be selected "on the run," while the
application is in progress.
The second paradox (which I shall call the "Part-Whole Paradox"*) derives from the following
situation: The currently applied image determines the perception, and the current perception
determines the applied image. So we have a sort of chicken-egg relationship between top-down and
bottom-up control.
This paradoxical relationship is essentially the same as that obtaining between the
individual and society. For there too the individual both determines and is determined by
the society.† So let us examine for a moment the structure of control within human societies,
and see whether we can, by analogy, learn something useful.
Perhaps the most obvious feature of control in human societies is its free-flowing
malleability, and the myriad forms it can assume. This is reflected in the plethora of terms
for human governmental structures: dictatorship, oligarchy, democracy, feudalism, anarchy,
representative democracy, parliamentary systems etc. But even these terms are
impoverished when one considers the baroque tableau of actual and potential control
relationships. For example, within a democracy there may be top-down hierarchies (armies,
corporations, churches), old-boy networks, executive orders, policy-setting bureaus,
informal pecking orders (i.e. in academia) and so on.
Furthermore, these structures are in a constant state of flux along two different
dimensions. First of all, the actual structure may change. For example, an unchallenged
dictator may grow senile and make increasingly grave errors, thus prompting a crisis of
confidence and coup by coalition. Secondly, the control regime within an existing structure
* I have taken this term from Mackay [29] who uses it to refer to the closely related paradox wherein context determines perception and vice versa. † In fact, Part-Whole Paradoxes of this type occur quite naturally in a variety of guises, and constitute, I believe, the central problem of top-down/bottom-up control. Surely these paradoxes deserve a much more thorough formal analysis than I shall give here.
72
may ebb and flow due to external circumstance. For example, an army platoon commander
may be making autonomous decisions in the field when his unit finds critical enemy maps
which he relays to his superior. These flow all the way back to supreme command, where a
completely different strategy is determined and passed back down the chain, resulting in a
new top down order to the platoon commander.
This flexibility and flux carries over into the domain of images. As Waltz [50] writes:
"A few perceptual clues may suffice for me to "see" that my wife is in the room of our house where I expect to find her. "Seeing" in this case involves a relatively large amount of top-down image construction with relatively little bottom-up processing. On the other hand, a task such as deciding whether I have cleaned all the food off the pot I am washing involves a much larger portion of bottom-up processing." (P. 569)
In other words, the proportions of bottom-up and top-down control in perception are not fixed
beforehand by a rigid regime; the control structure actually mutates through a wide space of possible
forms. This phenomenon is also apparent in the car game. If I go outside in the morning and see my car
where I left it, I recognize it at-a-glance through top-down expectation. But in the car game, I must pay
attention to the object so my perception is more data-driven or bottom-up. Still, as we saw in the car
game, bottom-up control may suggest a new hypothesis which results in a new top-down strategy
(shades of the platoon commander).
What these observations indicate is that images are not embedded in a fixed control regime; that is,
there is not an "executive algorithm" which, for instance, enforces a fixed schedule of bottom-up
control followed by top-down control. Rather, the images themselves should be the loci of control, each
capable of passing information up, receiving information from below, directing subordinates and
receiving direction from superiors.*
* This is an example of Vygotsky's [49] "analysis into units," wherein a complex multiform system is analyzed by breaking it into the smallest parts retaining the basic properties of the whole. This contrasts with what Vigotsky calls "analysis into elements," which in this case would involve splitting bottom-up and top-down control and trying to understand them in isolation. (See Vygotsky [49], P. 3-5.)
73
So we are maintaining, like Kant, that images are both active and passive. This is not implausible, for
as Casey [5] points out, free-state imagining has both spontaneous and controlled aspects.
On the one hand, images and sequences of images often seem to crop up and run their course
autonomously; and on the other hand, we often dominate our images and force them to conform to our will.
Likewise, when viewing an ambiguous figure like the Necker cube, a certain aspect may autonomously
crop up and defy our attempts to change it. or we may be able to bend the images to our will and shift
between aspects effortlessly.
So assuming that an image is a locus of control, both active and passive, I would like to give a
schematic, but I feel basically correct, account of the image application process which steers clear of the
Serial Recognition and Part-Whole paradoxes.
First of all, I propose that images congeal, through the formation process outlined above, into
fairly distinct units. This occurs because the bonds of predictability which hold the image together
must give way beyond a certain point. For instance, when an infant is confronted with his mother's
face, he may originally integrate the white wall behind her into his image through syncretism. But
this subjective bond will eventually disintegrate as the child begins to move and sees his mother
before different backdrops. So the child’s image of his mother's face will gradually dissociate from
the backdrop and form a local zone of predictability (i.e., an image).
I propose further (although I did not discuss this contingency in the section on formation) that these
images for local zones of predictability are each associated with a higher level interpretation (which I shall
call the image1s "label"), and the image as a whole is this label combined with the image's main body. In
this way structures of images can form recursive hierarchies. That is, an image is a structure of
interpretations held together by correlates of movement and is itself an interpretation, so the image can
have component images and can be part of larger images. This structure is illustrated on the following page.
74
I view the above hierarchy as essentially like the organization chart of an army, through which
bottom-up and top-down control can ebb and flow. This flux is represented by the three states an image in
this hierarchy can assume during the application process: off, on or active. When off, the image plays no
role in the perception; when on, the image has been evoked by information from below; and when active,
the image directs the perception by sequentially activating its subimages.
The application process is recursive, involving similar steps for each level of images. Here I shall
present only the first two steps; higher level steps can be extrapolated.
STEP 1: Suppose a person takes in an impression i1. This impression is matched in parallel against a
store of previously recorded impressions. The matching stored impression is associated in memory with a
number of atomic interpretations. Each of these atomic interpretations has a probability which has been
computed through experience. All things being equal, the interpretation with the highest probability (call it
a1) is selected. Now a1 may occur in only a single 1st level image. If so, that image (call it I1) is the only
one which applies, and we proceed to STEP 2. If not, then a1 is turned on in all the 1st level images in
which it occurs. At this point, the applicable image is uncertain, so the 1st level images containing a1
compete to determine the next movement. That movement is chosen which yields the most information, i.e.,
that which can optimally pare down the 1st order candidates through discrepancies. Call the selected
movement m1. m1 is executed yielding a new impression i2. At the same time, the arc labeled m1 is
followed in each 1st level image containing a1. This will yield a set of interpretations, one for each image
containing a1. i2 will not be able to be interpreted according to some of the members of this set, and the
associated 1st level images will be eliminated through discrepancies. The remaining 1st level candidates
are pared down, by iterating the above process, until only one 1st level image is left. Call this image I1.
STEP 2: At this point we have determined the applicable 1st level image I1. If there are no 2nd level
images, then we activate I1 and it begins to direct and interpret the perception. If there are 2nd level images,
turn I1 on. Now the label for I1 may occur in only one 2nd level image, I2. If so, then we have found the
2nd level image which applies. If there are no 3rd level images, then I2 is activated and directs the
perception. If there are 3rd level images, then we proceed to STEP 3 etc. Now if I1 occurs in multiple
76
images at the 2nd level, then the 2nd level image which applies is uncertain. So the 2nd level images must
compete to determine the 2nd level movement. That movement is chosen which maximizes information;
call it m2. m2 is executed, yielding a new impression i3. This input of i3 will initiate a new cycle of STEP 1
processing which yields the next applicable 1st level image, call it J1. Now the label of J1 is compared
against the results of m2 in all the candidate 2nd level images. This results in the paring down of the
candidates through discrepancies, and the 2nd level process continues until we have found a 2nd level
image which applies. Then we proceed to STEP 3 etc.
STEPS like the above are recursively iterated up the image hierarchy until the applicable image at the
highest level has been obtained. That image is then activated, and it begins to direct the perception starting
from its subimage under which the current experience falls. A high-level movement is executed to the most
informative point in the highest level image, yielding a new subimage. That subimage is then activated,
and the activation recursively descends down the hierarchy until atomic interpretations are applied to
incoming impressions.
However, if during this recursive descent of activation a discrepancy occurs, the image subsuming the
discrepancy will fail (i.e. shift from active to off), and another interpretation may be activated, causing the
bottom-up process to initiate again.
Obviously the above is only a rough sketch, but even so the underlying principle is quite conceptually
complex. To do justice to this complexity would require a thesis in itself, so here we will rest content with
pointing out two features of this control scheme which are relevant in our context.
i) Visual ambiguity: The system explains two important facets of visual ambiguity—the shifting and
mutual exclusion of interpretations. Shifting occurs when an image fails during the top-down activation
process, and the interpretation that caused the discrepancy, and thus failure, initiates a new episode of
bottom-up activity. Two interpretations must be mutually exclusive because an image can only interpret
the perception when it is active, and the rules of the algorithm ensure that an image never becomes active
until all its competitors have been eliminated.
ii) Malleability: The system accounts for the fact that different perceptions can involve varying
77
proportions of bottom-up and top-down control. When the system is confronted with an unfamiliar
situation or a situation where expectations are constantly being violated, bottom-up processing will
dominate. On the other hand, when the system is in a situation which has been classified under a
high-level image (i.e. like a man in his own house), many interpretations will be readily given by
top-down expectations.
This concludes my description of how images are applied. I am in no way prepared to claim that the
above process is psychologically accurate or even remotely close to being implementable. What I do claim
is that I am in the right ballpark and something like this must be going on during image application. My
reasons for believing this are as follows. First, there is wide-spread consensus that perception is a
multi-form process of interlaced bottom-up and top-down control, as was pointed out in the previous
chapter. Second, a number of proposed mechanisms for cognitive model control significantly resemble my
account (see Fahlman in Minsky [35], P. 264-267 and Lowe [28]). Third, my mechanism is a
generalization of the feature ring control scheme outlined in the previous chapters on the basis of
reasonably reliable experimental evidence. And fourth, it is difficult to imagine a process which does not
"ebb-and-flow" like mine avoiding the Serial Recognition Problem and Part-Whole Paradox.
In conclusion, I would like to summarize the main points we have made in this chapter.
1) The "Hypothesis Paradigm" seems to be a general feature of human mentation—intimately linked with noumena and recognition, and extending from the infant's first gropings to the highest reaches of science. Moreover, images can usefully be regarded as hypotheses, thus further connecting images with noumena and recognition. 2) The simple recording process of Noton and Stark can be modified via the splitting principle to account, at least partially, for the formation and differentiation of images through discrepancies. 3) The process of image application during perception is a complex interplay of bottom-up and top-down control guided by trademarks, discrepancies and confirmations. Also, this process must somehow overcome the Serial Recognition Problem and Part-Whole Paradox by using an ebb-and-flow principle similar to that of the mechanism we have outlined.
78
8. Conclusion Let us retrace our steps and take a larger view of the points this thesis has touched on. First
of all, we examined the methodology of AI, and adopted what might be called a "bottom-up"
design philosophy. That is, we resolved to put comprehensive understanding of the human mind
(or at least earnest steps in that direction) first, and formal computational tools second. Clearly,
in this modest exposition we have not succeeded in unraveling the entire knot. But I believe I
have succeeded in what I set out to do—namely to clarify the concept of imagination and
demonstrate its importance.
Our point of departure was "Linearity"—my shorthand for the fact that the mind is a
sequence of moments in time. Thus we challenged the tacit, alluring and all-pervasive objective
viewpoint, and swung to the opposite pole of subjectivity. (I trust that by now the reader knows
what I mean by "subjectivity.") This shift allowed us to see perception as the problem of
organizing our chaotic journey through time via ideal, mind-independent noumena—a process of
transcending subjectivity and awakening to objectivity.
We then proceeded to Hume and Kant, who both proposed that imagination is the means by
which perception is organized, and found their views to be essentially sound. For first, we saw
that the notion of identity (and hence noumena) requires imagination in that we must compare
the present perception with an absent one which has passed on, and second, images from the past
often "come alive" in present perception.
Next we asked: What do the apes lack that prevents them from accumulating culture like
human beings? We argued that they lacked a trick, and all indications, again, pointed to
imagination.
So assuming the importance of imagination, we analyzed the image in depth and found it to
have 3 basic properties:
i) Dynamic organization
ii) Pre-interpretation (i.e. the image is not raw or Humean)
iii) Integration through correlates of movement
79
With these properties in mind, we discussed the feature ring theory of perception which
originated in eye movement research. We found evidence that memories are, indeed, stored
subjectively in strips, and that these memories have eye movement components. We also found
evidence that feature rings (i.e. images) play a role in recognition, a process which we argued is
sequential in the overwhelming majority of cases.
Further, the sequential nature of recognition suggested the involvement of some sort of
hypothesis formation and evaluation mechanism. So we examined the "Hypothesis Paradigm"
and found it to be a highly general feature of human mentation, linked in particular with three
domains connected with imagery: noumena, recognition and imported imagination. And we
solidified this link by showing how the image could be viewed as a hypothesis.
Finally, we proposed tentative mechanisms to account for the two main problems of the
dynamic behavior of images—how they get into and are applied by the mind in time. In the
former case, we noted a continuity between the conditioned response and images, and described
a discrepancy-driven method of interpretation creation and differentiation. In the latter case, we
noted the malleability of the image application process (i.e. the fact that different perceptions
involve varying proportions of bottom-up and top-down control), and gave a mechanism
exemplifying this property.
Now we are left with a piece a of unsettled business. Recall that at the outset of this thesis
we reiterated Dreyfus' [9] question: If the prospects for AI seemed so rosy in the 1950's, what
happened? What unanticipated barrier did researchers come up against? Dreyfus answered this
question by describing four forms of "human information processing" which conventional AI is
at a loss to mimic, and I would like here to discuss three of these forms (the fourth being a
combination of the previous three), and consider whether their imitation might be aided through
imagistic constructs of the type we have developed.
The three forms, which we shall address in turn, are: fringe consciousness, ambiguity
tolerance and essential-inessential discrimination.
80
1. Fringe consciousness
Dreyfus describes fringe consciousness in the context of chess, and contrasts it with
heuristically-guided search—its conventional AI surrogate. It is perhaps most evident in the fact
that a human chess master, examining on the order of 100-200 potential board positions, can
select a better move than a computer program like, say, Cray Blitz, which examines as many as
10 million.
Dreyfus accounts for this discrepancy by noting that, in protocols where a chess player
thinks aloud, two distinct stages can be discerned—what he calls "zeroing-in" and
"counting-out." Zeroing-in generally occurs at the start of the protocol, and is expressed in
phrases like: "I notice his Rook is undefended." This is then followed by an episode of
counting—out, where the player examines a modest, but generally deep, tree of possibilities. So
it seems that humans have the advantageous ability to vastly trim the potential search space by
zeroing-in on the most crucial issue at any point in the game.
Dreyfus attributes this ability to fringe consciousness—a marginal awareness which shapes
our perception without being explicitly considered or excluded. And the way it does this is by
making things look different. For example, past experience, in the form of a vocabulary of
stereotypical chess patterns, makes certain zones of the board look promising or dangerous etc.
Dreyfus writes:
"In general what is needed is an account of the way the background of past experience and the history of the current game can determine what shows up as a figure and attracts a player's attention." (P. 105)
Now we have found that images account for precisely this sort of phenomenon—i.e., where
memory models recorded in the past "invade" or "direct" the present perception. And further, the
eye movement theory of visual ambiguity provides an explanation of how the look of a chess
position can change. So we may reasonably conclude that images have a role to play in fringe
consciousness.
81
2. Ambiguity tolerance
Dreyfus characterizes ambiguity tolerance as the ability to disambiguate or interpret
incoming data without explicitly considering all possible alternative meanings. He writes: "The sentence is heard in the appropriate way because the context organizes the perception; and since sentences are not perceived except in context they are always perceived with the narrow range of meaning the context confers." (P. 108)
So it seems that here again images have a role to play. For as we saw in the application
mechanism, once an image has been activated, it directs the perception and assigns
interpretations to incoming impressions, thereby eliminating the need to consider alternatives.
3. Essential-Inessential discrimination
Dreyfus equates this ability with insight, and contrasts it with trial-and-error search. Now we have
already obliquely implicated imagination in insight by our remarks on imported imagination in chapter 5
(recall in particular the scientific examples), but let us try to make this connection more explicit.
Suppose that I, like an ape, am placed in a cage with a banana outside the bars which I must obtain. A
problem has been set for me, which can be summarized in the phrase "how to get it." So what happens
when my eyes settle on the stick and I suddenly see that it is a solution to my problem? It seems that I do
not, at the point of insight, entertain images of myself picking up the stick and fishing through the bars
with it. The insight is too quick, and that stage seems to come later. Rather it seems that my mind suddenly
blossoms into an atmosphere or "set" in which the stick and banana play a central role. I have vague
intimations not only of the stick, but also what I shall do with it, why I shall do i, and what will happen.
But these are surely inchoate intimations. What I have is an insight, which like a seed is simple, yet holds
within itself the directions for development and growth.
Now my contention is that insight can be understood as imported imagination—i.e., as what
Wittgenstein calls "the dawning of an aspect." For example, in the above case, the insight essentially
amounts to an absent sensori-motor schema or image (namely that governing "stick fishing") suddenly
being evoked and "invading" the present perception.
82
Obviously I cannot prove that this interpretation of insight is correct because insight is not a
well-defined notion. But the following points make the interpretation at least reasonable. First, both insight
and shifting of aspects have a similar "magical" and startling quality. Second, it seems that almost all
scientific insights amount, in the end, to seeing one thing in terms of another; for example, light as waves,
projectiles as parabolas, space as a curved surface, biological cell components as machinery, (Z, +) as
(R, ・) etc. And third, imagination is generally conceded to play a role in breaking free of traditional
blindnesses and bias.
If this equating of insight with imported imagination is valid, then we have succeeded in implicating
imagination in all of Dreyfus' forms of human information processing. So does this mean that imagination
is the great panacea which will solve all our problems? I am hardly so bold as to go that far. What I will
say is what I have been saying all along—that the objective schemata of AI, while suitable representations
of our pure concepts, are totally out of touch with our subjective experience in time, and images seem to
occupy a desirable half-way point between subjectivity and objectivity. Unfortunately I have taken only
baby steps toward connecting these two poles, but that is to be expected. After all, the complete transition
from the sensori-motor intelligence of infancy to formal abstract adult thought takes on the order of 13
years. Nevertheless, I regard the explication of this transition as the outstanding problem which my
research has raised.
Lastly, I would like to list my remaining questions:
1. How does imagination function in thought (as opposed to perception)? How does imagination function
in action? What is the relation between imagination and affect?
2. How are free-state images controlled?
3. How are images created as models for abstract concepts, and how do these models function in thought?
4. How do the image formation and application processes relate?
5. How do noumena relate to our concepts of causality?
6. Why does the objective viewpoint dominate and hide subjectivity? That is, what is the nature of the
"crypto-mechanism"?
83
f
References 1. Beard, R. M., An Outline of Piaget's Developmental Psychology for Students and Teachers, New
York: Mentor Books, 1969. 2. Bierderman, I., Rabinowitz, J. C., Glass, A. L., and Stacy, E. W., "On the Information Extracted
from a Glance at a Scene," Journal of Experimental Psychology, 1974, 103, 597-600. 3. Binet, A., L'étude experimentale de l'intelligence, Paris: Schleicher, 1903. 4. Buettner-Janusch, J., Origins of Man: Physical Anthropology, New York: John Wiley & Sons,
1966. 5. Casey, E. S., Imagining: A Phenomenological Study, Bloomington, Indiana: Indiana University
Press, 1976. 6. Chase, E. V., & Simon, H. A., "Perception in Chess," Cognitive Psychology, 1973, 4, 55-81. 7. Coren, S., "The Interaction Between Eye Movements and Visual Illusions," in D. Fisher, R.
Monty and J. Senders (Eds.) Eye Movements: Cognition and Visual Perception, New Jersey: Lawrence Erlbaum, 1981, P. 67-81.
8. Dement, W., and Kleitman, N., "Eye Movements During Sleep," Journal of Experimental Psychology, 1957, 53, 339-346.
9. Dreyfus, H. L., What Computers Can't Do [Revised Edition],New York: Harper & Row, 1979. 10. Dreyfus, H. L., and Dreyfus, S. E., Mind over Machine, New York: Free Press, 1986. 11. Dyer, M. G., In-depth Understanding, Cambridge, Massachusetts: MIT Press, 1983. 12. Farley, A, M., "A Computer Implementation of Constructive Visual Imagery and Perception," in
R. Monty and J. Senders (Eds.) Eye Movements and Psychological Processes, New Jersey: Lawrence Erlbaum,1976, P. 499-513.
13. Flavell, J. H., Cognitive Development [2nd Edition], New Jersey: Prentice-Hall, 1985. 14. Gale, A. G., and Findlay, J. M., "Eye Movement Patterns in Viewing Ambiguous Figures,” in R.
Groner, C. Menz, D. Fisher and R. Monty (Eds.) Eye Movements and Psychological Functions: International Views, New Jersey: Lawrence Erlbaum, 1983, 145-168.
15. Goodall, J., The Chimpanzees of Gombe, Cambridge, Massachusetts: Belknap Press, 1986. 16. Groner, R., Walder F., and Groner, M., "Looking at Faces: Local and Global Aspects of
Scanpaths," in A. G. Gale and F. Johnson (Eds.) Theoretical and Applied Aspects of Eye Movement Research, Amsterdam: North-Holland, 1983, P. 522-533.
17. Hasegawa, M., Kishino, H., and Yano, T., "Dating of the Human-Ape Splitting by a Molecular Clock of Mitochondrial DNA," Journal if Molecular Evolution, 1985, 22, 160-174.
18. Hebb, D. O., The Organization of Behavior, New York: John Wiley & Sons, 1949. 19. Hebb, D. O., "Concerning Imagery," Psychological Review, 1968, 75, 466-477. 20. Hochberg, J. E., "In the Mind's Eye," in R. N. Haber (Ed.) Contemporary Theory and Research in
Visual Perception, New York: Holt, Rinehart & Winston, 1968, P. 309-331. 21. Hume, D., A Treatise of Human Nature, Oxford: Clarendon Press, 1888. 22. Intraub, H., "Identification and Processing of Briefly Glimpsed Visual Scenes," in [7], P.181-190. 23. Jaspers, K., General Psychopathology, Trans. by J. Hoenig and M. W. Hamilton, Chicago:
University of Chicago Press, 1963. 24. Kant, I., Critique of Pure Reason, Trans. by N. K. Smith, New York: St. Martin's, 1965. 25. Kohler, W., The Mentality of Apes, Trans. by E. Winter, London: Kegan Paul, Trench, Truber &
Co., 1925. 26. Loftus, G. R., "A Framework for a Theory of Picture Recognition," in [12], P. 499-513. 27. Lorenz, K., Behind the Mirror: A Search for a Natural History of Human Knowledge, Trans. by R.
84
Taylor, New York: Harcourt Brace Jovanovich, 1977. 28. Lowe, D. Perceptual Organization and Visual Recognition, Boston: Kluwer Academic, 1985. 29. Mackay, D. G., The Organization of Perception and Action, New York: Springer-Verlag, 1987. 30. Mackworth, N. H., and Morandi, A. J., "The Gaze Selects Informative Details within Pictures,"
Perception & Psychophysics, 1967, 2, 547-552. 31. Marr, D., Vision, New York: W. H. Freeman & Co., 1982. 32. Merleau-Ponty, M., The Phenomenology of Perception, Trans. by C. Smith, New Jersey:
Routledge & Kegan Paul, 1962. 33. Miner, J. B., "A Case of Vision Acquired in Adult Life," Psychological Review Monograph
Supplement, 1905, 6, no. 5, 103-118. 34. Minsky, M. "Steps toward Artificial Intelligence," Proceedings of I. R. E., 1961, 49. 35. Minsky, M., "A Framework for Representing Knowledge," in P. H. Winston (Ed.) The
Psychology of Computer Vision, New York: McGraw-Hill, 1975, P. 211-277. 36. Noton, D., and Stark, L., "Eye Movements and Visual Perception," Scientific American, 1971,
224, 35-43. 37. Penfield, W., and Roberts, L., Speech and Brain-Mechanisms, New Jersey: Princeton University
Press, 1959. 38. Piaget, J., The Construction of Reality in the Child, Trans. by M. Cook, New York: Ballantine
Books, 1954. 39. Pylyshyn, Z. W., "What the Mind's Eye Tells the Mind1s Brain: A Critique of Mental Imagery,"
Psychological Bulletin, 1973, 80, 1-24. 40. Pylyshyn, Z. W., "Imagery and Artificial Intelligence," in C. W. Savage (Ed.) Perception and
Cognition: Issues in the Foundations of Psychology, Minnesota Studies in the Philosophy of Science, Vol. 9, Minneapolis: University of Minnesota Press, 1978.
41. Ristau, C. A., and Robbins, D., "Language in the Great Apes: A Critical Review," in J. Rosenblatt, R. A. Hinde, C. Beer, and M. C. Busnel (Eds.) Advances in the Study of Behavior, Vol. 12, New York: Academic Press, 1981.
42. Sartre, J., The Psychology of Imagination, Secaucus, New Jersey: Citadel Press. 43. Senden, M. v., Raum- und Gestaltauffassung bei operierten Blidgeborenen vor und nach der
Operation, Leipzig: Barth, 1932. 44. Sibley, C. G., and Ahlquist, J. E., "The Phylogeny of the Hominid Primates, as indicated by
DNA-DNA hybridization," Journal of Molecular Evolution, 1984, 20, P. 2-15. 45. Sowa, J. F., Conceptual Structures, Reading, Massachusetts: Addison-Wesley, 1984. 46. Stark, L., and Ellis, S., "Scanpaths Revisited: Cognitive Models Direct Active Looking," in [7],
P.193-226. 47. Strawson, P. F., "Imagination and Perception," in L. Foster and J. W. Swanson (Eds.) Experience
and Theory, Massachusetts: University of Massachusetts Press, 1970, P. 31-54. 48. Tolman, E. C., "Cognitive Maps in Rats and Men," Psychological Review, 1948, 55, 189-208. 49. Vygotsky, L. V., Thought and Language, Trans. by E. Hanfmann and G. Vakar, Cambridge,
Massachusetts: MIT Press, 1962. 50. Waltz, D. L., "On the function of mental imagery," The Behavioral and Brain Sciences, 1979, 2,
569-570. 51. Warnock, M., Imagination, Berkeley and Los Angeles: University of California Press, 1976. 52. Wittgenstein, L., Philosophical Investigations, Trans. by G. E. M. Anscombe, Oxford: Basil
Blackwell,1953.
85