Ontology of Early Visual Content
Transcript of Ontology of Early Visual Content
1
For the final version, suitable for citation, see:
http://www.tandfonline.com/eprint/vF9QqrR56BhmxArSnPck/full
Ontology of Early Visual Content
Błażej Skrzypulec1
The main goal of the paper is to sketch an ontological model of visual content at the low- and
medium-level of visual processing, relying on psychological conceptions of vision. It is
argued that influential cognitive models contain assumptions concerning “objects of
content”, i.e. objects whose presence is a necessary condition of visual representations’
adequacy. Subsequently, the structure of considered objects of content is presented and it is
described how it develops through the perceptual process. In addition, during the course of
the article I present some of the connections between analytic metaphysics and the ontology
of visual content.
Key words: Content; Ontology; Vision; Representations; Metaphysics
The way in which our visual perception presents reality seems to suggest an ontology
of the world as composed of objects located in three-dimensional space, which possess
various properties and can be distinguished as falling into different kinds. Similarly, it is
commonly claimed within contemporary philosophy of perception that visual representations
present the world as being a certain way (e.g., Siegel, 2010; Schellenberg, 2011). In other
words, the content of visual representations specifies objects that have to be present in a
visual field if these representations are to be adequate, and so determines a certain “visual
ontology”. We may ask for a more precise description of such a “visual ontology” and seek to
discover how it is developed through the various stages of the perceptual process. My goal in
this article is to provide a sketch of such an ontology, relying on the assumptions of influential
scientific models of vision. Because of that I do not restrict the class of visual representations
to conscious perceptual experiences (on which philosophers often focus in discussions
regarding content), but I consider contents of visual representation that are postulated by
cognitive, psychological models. I argue that such models contain assumptions concerning
objects whose presence is a necessary condition of visual representations’ adequacy. The
1 Błażej Skrzypulec is a PhD student at Jagiellonian University. Correspondence to: Błażej Skrzypulec, Instytut
Filozofii UJ, ul. Grodzka 52, 31-044 Kraków, Poland. Email: [email protected].
2
analysis of these assumptions allows us to answer the question of how, according to scientific
models of vision, would the world have to be for the representations these models postulate to
be adequate.
I believe that there are at least two reasons why we should be interested in answering
the above question. First, there exist philosophical controversies concerning the ontology of
visual content. For example, Russell applied a version of the bundle theory of objects in the
context of vision by stating that visual objects can be analyzed as certain combinations of
various visual qualities, with special positional qualities among them (Russell, 1956; 2009).
Opposed to this view, defenders of “bare substratum” theories claimed that in seeing two
objects that are qualitatively the same we are visually acquainted not only with features and
locations but also with irreducible particulars (Allaire, 1963). More recently, Austen Clark
(2004) argued that to present an adequate account of visual content a notion of “visual
particulars” is needed, as a mere description in terms of visual features cannot distinguish
situations in which the same features are combined differently. Various authors have proposed
alternative views on the nature of such “visual particulars” (e.g., Cohen, 2004) and have
discussed the usefulness of this notion for explaining visual phenomena, for example
connected with representing the persistence through change (O’Callaghan, 2008). The
postulates of scientific models of vision play an important role in these debates and the
detailed explication of their assumptions concerning a visual ontology can provide arguments
for and against certain philosophical positions.
Second, as is suggested by our common experience, the content of visual
representations is not chaotic, but is organized by certain stable rules. For example, intuitively
we may think that if a hue is represented then it is also represented as spatially located. Such
rules, if they are true, are satisfied by every visual state, no matter what stimuli are received
by the perceptual system. Metaphorically speaking, they constitute a “form” that organizes the
content of all visual representations (Matthen, 2004, pp. 500). In other words, such rules
characterize an ontology that is “implemented” in the mechanism of the perpetual system: no
matter what arrangement of stimuli is received by the visual system, the resulting
representational content will satisfy those principles. Rules constituting the “implemented”
ontology of the visual system may be revealed by investigating the ontological assumptions of
scientific models of vision.
Within this paper I investigate what visual ontology is postulated in psychological
models that describe visual representations. I start by introducing a general notion of
representation and argue that scientific models of vision contain assumptions regarding the
3
ontology of representational content. I claim that these assumptions characterize “objects of
content”, i.e. objects whose presence is a necessary condition of visual representations’
adequacy. Subsequently, I describe the main statements of Feature Integration Theory and the
Coherence Theory of Attention, and talk about why these models are an interesting choice for
the presented case study. In the main part of the paper I characterize the ontology of visual
content suggested by the models I consider, and describe its development through the low-
and mid-levels of vision. In addition, during the course of the article I present some of the
connections between analytic metaphysics and the ontology of visual content here discussed.
1. Representations and Objects of Content
Of course one may doubt whether cognitive models of vision contain any ontological
assumptions at all. It should be clear that what is relevant here is not the attitudes of authors of
scientific models, for example whether they are interested in such ontological assumptions or
whether they deliberately rely on some philosophical conception while formulating their
theories. The important question is about the structure of scientific models: whether they in
fact contain ontological assumptions concerning visual content. Below, I argue for the thesis
that formulating a scientific model of visual representations is closely connected to making
postulates about “objects of content”, i.e. objects whose presence is a necessary condition of
visual representations’ adequacy.
Both classical theory of representations developed by Charles S. Peirce, as well as
modern conceptions formulated in the context of cognitive science (Marr 2010) propose,
using different terminology, three elements that are crucial for the general notion of
representation. The first we can call the “representational vehicle”. In the most general terms,
the representational vehicle is something in virtue of which an element of the world is
represented. Written words or a series of sounds used in speech are probably the most
intuitive examples of such structures. In David Marr’s conception, the word ‘representation’
refers mainly to the representational vehicle, which is a formal system, composed of simple
symbols that may be arranged within more complex ones according to certain rules (Marr
2010, pp. 20).
The second element is the “object of denotation”. Representations stand for
something, like verbal descriptions may stand for people or pictures may stand for landscapes,
and represent it correctly or not. These “objects of denotation”, i.e. objects for which
representations stand, should be understood very broadly, for example, in the case of
4
representational vehicles like written words, objects of denotation may be peoples, states of
affairs, abstract entities, etc. In the context of visual perception the objects of denotation can
be generally understood as fragments of the environment that in some way interfere with light
coming to retinas, mainly by producing, reflecting, or absorbing it.
From the above remarks we know that representational vehicles, or representations in
a narrow sense, refer to some objects of denotation, and that these objects may be represented
adequately or not. The role of the third element, “representational content” (in Marr’s
terminology “a description of an entity in a representation” (Marr 2010, pp. 20)), is to specify
the conditions of representations’ adequacy. Representational content determines what has to
be satisfied by an object of denotation if a given representation is to be adequate. Using
another common example we may imagine that a representational vehicle is an
inscriptionsay, “This is a tree”. In such a case, intuitively, representational content
determines that if the representation is adequate, then the object of denotation is a tree. In the
case of visual representations, representational content specifies objects whose presence in the
visual field is the necessary condition of visual representations’ adequacy. These objects may
be named “objects of content”.
Relying on the general notion of representation we may ask which of the three
elementsrepresentational vehicle, object of denotation, or representational contentare
characterized by statements included in cognitive models of vision. Descriptions of visual
representations proposed in such models are often composed of statements like: “A is
represented by B”. For example, in Feature Integration Theory (FIT) it is claimed that at the
beginning of the perpetual process features are represented by activities in certain areas of the
brain called “feature-maps” (e.g. Treisman, 1998, p. 1295). To show that cognitive models of
vision contain ontological statements concerning representational content it is enough to argue
that at least in some cases their statements characterize objects of content.
Of course, the main goal of scientific models is to explain how the visual system
fulfils a certain function by describing a mechanism in virtue of which the considered
function is realized. There are different levels of description that can be used. However, in all
cases describing a mechanism is equal to describing a certain representational vehicle and its
operations. Although characterizing the objects of content is not the main goal of cognitive
models, it is difficult to obtain a proper description of a mechanism without having any
characterization of content. In the case of visual representation, the most relevant mechanisms
are responsible for representing some aspects of the environment. To propose an adequate
description of a mechanism we need to specify what the representational abilities of a
5
perceptual system are at a given stage of the perceptual process. We have to characterize what
can be represented at a given stagefor example, as in the FIT model, that the visual system
at the beginning of the perceptual process is able to represent the presence of features, but not
their combinations.
In fact, a specification of representational content is a part of “computational theory”
as described by Marr (2010, pp. 23). Such a theory contains, inter alia, rules that characterize
what can be represented by cognitive a system. For example, according to Marr’s postulates
(Marr 2010, pp. 37), at the earliest levels of visual processing the cognitive system is able to
represent discontinuities between certain features (like levels of luminance), but does not
represent regions uniformly “filled-in” by features. Later, these local discontinuities are
represented as composing edges, but only if they are represented as being in a certain spatial
arrangement. The main goal of a cognitive model is then to describe a mechanism responsible
for such representational abilities.
A description of the representational abilities of a visual system does not refer to, or at
least not only to, objects of denotation of visual representations. This is so because
representational abilities of a visual system are different at subsequent stages of the perceptual
process, while objects of denotation, which are sources of visual stimuli, stay the same. What
is more, visual representations can be inadequate, so the same objects may be represented no
matter whether proper objects of denotation are present within the visual field. Because of
this, objects characterized in descriptions of representational abilities seem to be objects of
content, i.e. objects whose presence is necessary for representations’ adequacy rather than
objects that in fact causally interfere with visual system, and such descriptions entail a certain
visual ontology.
The above argument shows that a characteristic of objects of content is usually needed
in order to satisfy the main goal of cognitive models of vision, i.e. formulating a description
of a mechanism that is responsible for certain representational abilities of a visual system.
While arguing for the presence of ontological assumptions in cognitive models I
frequently referred to the notion of representational adequacy. However, it is important to
note that the project of analyzing the ontological postulates concerning objects of content does
not presuppose any particular conception of a visual representations’ adequacy. This general
project is consistent with a strong claim that such representations (or more specifically their
representational vehicles) are composed of language-like structures that are literally true or
false about the environment, as well as with alternative statements according to which visual
representations are picture-like mental models that can achieve a certain mapping with
6
external entities. In fact it is sufficient to adopt an “instrumentalist” account of representation
in which representations are theoretical entities postulated in cognitive models to construct
successful explanations, without assuming any claims about the real existence of mental
models somehow produced by the brain. The only assumption that is needed for
investigations concerning visual ontology is that cognitive models describe representations as
entities that may model the environment correctly or incorrectly and that their correctness
entails the presence of certain objects.
It should be noted that giving the characteristics of objects of content may not exhaust
the adequacy conditions of representations. For example, not only must a certain feature be
present, it also has to be causally related to a visual system in some appropriate way (see
Coates, 2007, pp. 57). However, in the following article I am concerned only with the
narrower notion of representational content that specifies entities (and relations between
them) whose presence is a necessary condition of visual representations’ adequacy, without
claiming that these necessary conditions are also sufficient.
2. Models and Ontology
Below, I present brief descriptions of the models we shall consider, namely: Feature
Integration Theory (FIT) and Coherence Theory of Attention (CTA). There are several
reasons for choosing FIT and the CTA rather than analyzing other models or adopting a more
general approach that does not focus on any particular conceptions. First, providing a detailed
analysis of chosen models helps to prove that there are interesting ontological claims
regarding content that can be inferred from cognitive models of vision. Restricting our scope
to two models allows us to refer, given the limited space available here, to particular claims
presented in the considered models. In particular by looking at FIT, which is a classical point
of reference in discussions of early levels of the visual process, it will be shown that
ontological assumptions are present in a well-known and influential model.
Second, these two models are representative of a wider class of models: those which
characterize visual objects as constructions out of features and locations (in case of FIT) and
those which describe visual objects as persisting individuals (in case of the CTA). Even if
these particular models are not entirely accurate, as it is often assumed about the classical
version of FIT, their main ontological assumptions are shared among many others important
conceptions. In subsequent parts of the paper I point out the elements that FIT and the CTA
share with other models, for example with Marr’s classical conception and Pylyshyn’s FINST
7
theory. In addition, I do not assume that analysis of FIT and CTA provide a full description of
the objects of content through the perceptual process. Analyzing other models can reveal
additional objects of content, for example categorized objects represented at higher stages of
visual processing.
The third and the most important point is that the choice of FIT and the CTA has a
particular philosophical significance. These two models, which characterize subsequent
phases of the perceptual process, can be used to demonstrate an ontological change regarding
visual content that separates earlier and later visual representations. As will be shown,
according to FIT, objects of content are bundles of features and locations; but in case of CTA
they are described as irreducible, numerically different individuals that possess some features.
This difference allows us to formulate a hypothesis that ontologically different objects are
represented at different stages of the visual process. Such a result has direct consequences for
philosophical debates concerning visual content, in which it is discussed whether visual
objects can be adequately described by using a version of a bundle theory of objects, and how
their structure is connected with ability to persist through change.
Feature Integration Theory, originally presented by Treisman and Gelade (1980), is
one of the best-known models, which concerns relatively low-levels of the visual process. Its
main goal is to explain how visual information, initially computed in different brain areas, is
combined in a way that allows for the perception of objects that have many features and for
the ability to distinguish between those objects that share some features. The question of how
cognitive mechanisms connect various pieces of information about represented objects is
generally known as the “binding problem”. In the context of visual perception, the binding
problem is usually connected with the ability to represent proper combinations of features and
localizations of objects. The fact that the binding problem is considered within models of
early vision is already important from the point of view of ontological investigations about
visual content, as it shows that even low-level visual representations represent some
combinations of features. However, cognitive models of vision usually treat visual binding as
a part of the representational vehiclea process that connects information about the visible
environment that were earlier computed separately (see Roskies 1999 for a short review). The
most influential hypotheses regarding the nature of such mechanism involve attentional top-
down influences, hierarchical structures of specialized cells, or synchronization of the
activities of different neuron groups (e.g., Schillen, König, 1994).
Despite this, there is also a sense of visual binding that concerns representational
content. The binding within the representational vehicle allows for representing objects that
8
are bound together (for example, different features like color and shape). Because of this, we
may ask an ontological question about the rules that govern such binding and about the formal
properties of this represented relation. For example, one may ask, if is it the case that if two
features are represented as bound together, then they are represented as bound with a common
location, or whether the relation of binding is transitive. Later I call this relation of binding
between objects of content the B-relation, and I investigate what can be inferred about it from
postulates of FIT.
The scope of the second model – Coherence Theory of Attention (Rensink, 2000a) –
partially overlaps with the scope of FIT. According to this model, at the early levels of visual
processing, “proto-objects” are represented. These are assemblies of features similar to those
described in FIT (Rensink 2000a, pp. 22). According to CTA, proto-objects are volatile: as
they are replaced with every change they have a limited ability to persist through time
(Rensink 2000a, pp. 20). However, more stable, persisting objects are also represented at the
level of perceptual processing described by CTA. Every object is related to some proto-
objects and this relation allows objects to possess properties and to have parts (Rensink
2000a, pp. 24). The description of the representational vehicle in CTA is more abstract than in
FIT and does not refer to concrete brain structures. The most important part of the vehicle is
the “nexus”: a mental structure that is created when an attentional mechanism gathers
information from elements that represent proto-objects (Rensink 2000b, pp. 1473). Generally,
a nexus represents that there is an individual object in the environment and also that it has
properties. These more complex representations rely on information from simpler
representations that represent proto-objects. What is more, a nexus is to some degree capable
of storing information, which allows for the representation of the persistence of objects.
The descriptions of representational content proposed in the above models do more
than just characterize the ontological adequacy conditions of some particular representations.
– they determine general rules that govern visual content. These rules, such as “every feature
is related to a location”, are satisfied in every visual state that can be produced at a considered
stage of perceptual processing. Because of this, an ontological model of visual content
composed of them designates a class of all possible arrangements of objects of content that
may be represented at a given level. Analyzing such rules allows us to investigate the
ontology “implemented” in the perceptual mechanism that is the same for all perceptual states
at a given level of perceptual processing. In the further sections of this paper I analyze what
“implemented” ontology of content is postulated within the models we have been considering.
9
3. Ontology of FIT
According to the usual presentations of Feature Integration Theory, it describes two
stages of the perceptual process. At the first stage features and locations are represented while
at the second they are additionally represented as related in a certain way. The vehicle of
representation is composed of feature-maps and the master map of locations – neural
structures whose various elements and activities represent different locations and features.
The relations between features and locations are represented by binding elements of feature
maps with elements of the location-map, which is often understood in terms of the
synchronizing activities between groups of neurons.
3.1 Features and Locations
Features serve as atomic elements of the content ontology of FIT. They are not built of
simpler entities, but serve rather as a “material” that constitutes more complex structures.
According to FIT, features are divided into dimensions (Treisman, 1982, pp. 197; Treisman,
Gelade, 1980, pp. 98). In most descriptions of FIT, this division is made by referring to
intuitions and by giving examples of different dimensions. Color, orientation, and simple
shapes (like shapes of different letters) are presented as belonging to different dimensions of
features.
One might doubt whether these examples, especially those concerning shapes, present
good candidates for atomic ontological elements. However, if the examples presented in FIT-
related papers are accurate, we can draw two conclusions that are not explicitly stated in the
presentations of FIT. Firstly, it seems that features from the same dimension cannot be
simultaneously related to the same location – if a color, shape, or orientation is related to a
location at a given time, then no other color, shape, or orientation is related to that location at
that time. Secondly, dimensions such as color and dimensions such as orientation or shape
seem to contain two different types of features. Features like color can be related to any
location, but this is not true of features such as orientation and shape. For example, if a
location can be related to the feature of squareness, then it cannot be related to the
triangularity-feature (in principle, not only temporally simultaneously). In a similar fashion, a
location that is not elongated (e.g. a circular location) cannot be related to any orientation-
feature. Features that can be related only to some locations can be named “emergent features”,
as they seem to supervene on structures of locations.
10
According to FIT, locations constitute a second type of represented entities, whose
vehicle of representation is the “master-map of locations”. According to FIT, locations are not
represented as relations between features, but in an absolute way, as fragments of a 3-
dimensional space (Treisman, 1982, pp. 198). It is not entirely clear whether visual systems at
early level represent atomic, point-like locations, or more complex locations which have parts
and may be treated as sums of atomic locations. It is quite plausible that at least some
complex locations may be represented. Many emergent features must be related to complex
locations if they are to be related to anything. What is more, it is claimed that the visual
system represents regions (Treisman, 1998, pp. 1296), which suggests complex rather than
atomic locations. To distinguish atomic locations from complex locations, a notion of the
asymmetric and transitive proper parthood relation may be useful. As opposed to complex
location, atomic locations do not have proper parts. This property distinguishes location from
features (as characterized in FIT): locations may have proper parts, but all features are atomic.
It may be asked whether locations and features should be treated as separate types of
entities, or whether locations constitute one of the feature-dimensions. Some statements
suggest that locations constitute a special type of feature (“locations differ from other
features”; Treisman, 1982, pp. 197). Below I do not make any strong statement concerning
the relation between the location-type of entities and the feature-type of entities. I will speak
about locations as forming a separate type that is distinct from features, mainly because it
allows me to present the material more clearly.
From the above we may formulate some basic ontological characteristics of the
features and locations postulated in FIT model:
(1) Every entity is a feature or a location. No entity is both a feature and a location.
(2) Proper parthood is an asymmetric and transitive relation. An entity x stands in
proper parthood relation to y iff x is a proper part of y.
(3) If something is a feature, then it is an atomic entity – it has no proper parts.
(4) If something is a feature, then it belongs to one and only one feature-dimension.
(5) If something is a location, then it is atomic or complex.
(6) A location is atomic iff it does not have any locations as its proper parts.
(7) A location is complex iff it has at least one location as its proper part.
(8) If two features belong to the same dimension, then they cannot be simultaneously
related to the same location. 2
2 Later in the paper, the relation between features and locations is characterized in a more detailed way, under the
name “B-relation”.
11
(9) A feature is an emergent feature iff there are locations to which it cannot be
related.
According to FIT, locations and features belonging to various dimensions are basic
elements constituting visual content and are represented from the very beginning of the
perceptual process. This assumption is expressed not only within FIT but is widely shared in
other influential models of early visual processing. For example, in classical investigations
concerning the physiology of early vision conducted by Hubel and Wiesel, it is postulated that
the activities of sub-cortical cells on visual pathway and cortical cells in V1 layer represent
certain localized features, like local differences in luminance (Hubel, Wiesel 1962). Because
of this, the claim presenting features and locations as basic elements of content does not
depend strongly on accepting FIT as a wholly adequate conception of early vision, but rather
expresses a common ontological description present in different scientific models.
3.2 Feature-Bundles
According to FIT, the main goal of the early visual system is to properly relate
features to locations. I postulated above that distinct types of features differ in their abilities to
be related to locations (emergent features may be related only to certain locations) and that
features from a single feature-dimension cannot be simultaneously related to the same
location. Now it is a good time to explicate more precisely what ontological characteristics are
connected with this relation.
Unfortunately, the descriptions of FIT do not contain much information about the link
between locations and features. It is clear that the relation between features and locations is
represented by some sort of bond between elements of feature maps and the master map of
locations (e.g., Treisman, 1996, pp. 172). What is more, the same operation binds not only
elements of feature maps with elements of the location map, but also connects elements of
different feature maps, if they are bound with the same element of the map of locations
(Treisman, Gelade, 1980, pp. 100). This suggests that the visual system represents a single
relation that is able to connect features with locations as well as different features. I call this
relation the B-relation, as it is represented by some type of neural binding at the level of
representational vehicle.
It seems that a single feature may be simultaneously B-related to multiple locations. In
addition, as is explicitly stated in the characteristics of FIT (e.g. Treisman, 1998, pp. 1296), a
single location can be also simultaneously B-related to more than one feature. However,
12
features and locations are not represented as B-related in random configurations. The goal of
the visual system is to represent that some features are B-related to a single location, and in
virtue of that are also B-related with each other (Treisman, Gelade, 1980, pp. 100). Such
structures, according to FIT (Treisman, 1996, pp. 172; Treisman, Gelade, 1980, pp. 98), serve
as the most basic representations of objects and are individuated by the fact that each one
contains a unique location. These complex entities may be called feature-bundles and can be
defined in the following way:
(10) An entity is a feature-bundle iff it is (1) constituted by exactly one location, and
(2) is constituted by all features that are B-related to this location, and (3) all its constituents
are B-related to each other.
The above definition entails that feature-bundles are individuated by locations.
Obviously, if feature bundles are constituted by different locations, then these feature-bundles
are different. This is because every feature-bundle is only constituted by a single location. It is
also the case that if feature-bundles are different, then they are constituted by different
locations. Feature-bundles that were different, but were constituted by the same location,
would have to differ in the respective features or patterns of the B-relations. However, if
feature-bundles contain the same location, then they also have to contain the same features, as
feature-bundles contain all features B-related with their locations. Feature-bundles also cannot
differ in patterns of B-relations, for the simple reason that all elements of every feature-bundle
are B-related to each other. Because of this, the sameness of locations is a sufficient condition
for sameness of feature-bundles.
The above facts provide the identity criterion for feature-bundles:
(11) Feature-bundles are different iff they are constituted by different locations.
What are the formal characteristics of the B-relation? As I stated earlier, the
descriptions of FIT do not give an explicit answer. One might have the sense that the relation
between features and locations should be characterized as the asymmetric relation of
“localization”. However, this is unlikely to be true in this context, where the same relation
may also hold between various features. Because of that the relation should be rather
characterized as symmetric. What about transitivity? Treating the B-relation as transitive
would entail some unwanted consequences. Firstly, it would lead to the relation of features
from the same dimension to a single location. Secondly, it would lead to the relation of
locations to each other – but such a possibility is never mentioned is descriptions of FIT. For
example, let’s assume that there is a bundle1 constituted by red, squareness, location1, and that
there is a bundle2 constituted by red, triangularity, location2. If the considered relation were
13
transitive, we would get a structure that is constituted by all those entities: location1, location2,
red, squareness, and triangularity. This would not allow for the representation of the
difference between a situation in which location1 is connected with redness and squareness,
and location2 with redness and triangularity from the opposite case in which location1 is with
redness and triangularity, and location2 with redness and squareness.
For these reasons, the relations between features and locations should rather be
characterized as intransitive:
(12) B-relation is a symmetric and intransitive relation that connects a feature with a
location or a feature with a feature.
Every account of the binding relation should be able to resolve the so-called “Many
Properties” problem. It is commonly observed that the content of visual states cannot be
adequately described only in term of represented features, since this does not allow for
distinguishing perceptual states that are clearly different. This can be demonstrated by the
following illustration. There can be two perceptual states: (I) a first, in which a green square
object is represented in location1 and a red square object is represented in location2, (II) and a
second in which the situation is reversed: a red square object is represented in location1 and a
green square object in loacation2. These two states cannot be distinguished just by listing the
represented basic elements, as the list in both cases will be the same, containing: redness,
greenness, squareness, location1, and location2. To resolve the many properties problem a
relation has to be introduced that determines which elements compose more complex objects.
As was noted by Austen Clark (2004, pp. 449), this role cannot be fulfilled by logical
conjunction. The main reason for this is the transitivity of conjunction. As I stated earlier, if
the B-relation was transitive, then inappropriate binding between various locations and
features belonging to the same dimension would occur. In fact, if the B-relation was
conjunction then in state (I) as well as in state (II) all elements would be conjoined. Because
of this, these two states cannot be discerned by using conjunction as a binding relation.
The B-relation I propose is intransitive and symmetric. Having these formal properties
it is more similar to relations like “compresence” or “coinstantiation” that are proposed in
metaphysical bundle theories of objects. Despite various differences between versions of the
bundle theory, the general claim is that entities constituting an object stand to each other in a
symmetric but intransitive relation (e.g., Demirli, 2010; Ehring, 2001). By using such a
relation, together with the definition of a feature-bundle (see (10)), it is possible to resolve the
“Many Properties” problem and discerns states like (I) and (II). In the first state there are
exactly two feature bundles: one composed of greenness, squareness, and location1, and a
14
second composed of redness, squareness, and location2. The second state also contains two
bundles, but their arrangement of elements is different to that in state (I). What is more,
because location plays the role of individuator in a structure of a feature-bundle (see (10) and
(11)) we get a criterion for deciding how many qualitatively same feature-bundles are
represented within a single perceptual state.
Overall, the occurrence of binding extends the visual ontology (compare to (1)):
(13) Every entity is a feature, or is a location, or is a feature-bundle (constituted by
certain features and a location). No entity can belong to more than one entity-type.
Early visual objects described in other models of vision are also characterized as
variants of feature-bundles postulated in FIT, i.e. they are basically certain combinations of
features and locations, or more complex structures composed of feature-bundles. One
example of an influential conception is David Marr’s classical theory, in which visual
primitives are, inter alia, local discontinuities, edges, or bars (Marr 2010, pp. 37). All of these
are structures composed of locations connected with features, where the most important
elements are spatially connected locations that are “filled-in” by incompatible features (like
different levels of luminance) and so designate borders between surfaces. Similarly, in the
important model of early vision proposed by Rock and Palmer, simple objects of content are
characterized as “uniform regions” that are maximal, spatially coherent locations all of whose
parts are “filled-in” by the same feature (Rock, Palmer 1994). Again this shows that the full
adequacy of FIT is not a necessary condition for accepting feature-bundles as objects of
content.
3.3 Ontology of FIT – Summary and Philosophical Connections
Relying on the above observations, we can see that according to FIT model, visual
systems at the very first level of the perceptual processing represent the environment as
containing features and locations. Features are atomic elements and are divided into
dimensions, while locations may be atomic or complex. The goal of the early vision is to
represent features and locations as standing in the B-relation. Features and locations are
organized into feature-bundles that are composed of exactly one location and one or more
features that are B-related to each other.
The notion of represented feature-bundles is similar to the analysis of objects given by
bundle theories formulated on the grounds of analytic metaphysics. Among various versions
of the bundle theory, that proposed by Russell (2009, pp. 78) seems to be the most similar to
15
the description of feature-bundles suggested by FIT. Russell claimed that a set of features (or
“qualities” in Russell’s terminology) forms a bundle iff all those features are compresent
(symmetric and intransitive relation) to each other and there is no other feature that is
compresent with all the features that constitute that set, but which do not belong to the set.
Russell applied his conception to the content of the visual field stating that there are
features, like redness, that are compresent with positional features that are different
localizations within visual field (Russell, 1956, pp. 337; Russell, 2009, pp. 230). According to
this proposition, the visual field is organized into bundles that are individuated by positional
features. This characteristic almost completely agrees with the ontology derived from FIT (see
(10)), and is also similar to the claims of other models that describe objects of content as
combinations of features and locations. However, Russell’s conception was devised to
describe the final product of the visual system whereas FIT only describes early visual
representations. In the next sections we shall see if the ontology of medium-level visual
content also agrees with the postulates of bundle theories.
4. The Ontology of CTA
Rensink’s Coherence Theory of Attention (CTA) describes the “medium level” of
visual processing. At this stage the environment starts to be represented as containing objects
that possess properties, have an internal part-structure, and are able to persist through change.
According to Rensink (2000b, pp. 1473), this is achieved by an attentional mechanism which
gathers information concerning various features in the environment and creates a “nexus” – a
mental structure that represents an individual object possessing properties. In what follows, I
begin by tracing the connection between FIT and CTA. I then describe the ontology of
medium level content on the basis of Rensink’s model as well as conceptions concerning
visual representations of persistence (e.g., Pylyshyn, 2007).
4.1 Proto-objects and Feature-Bundles
The CTA distinguishes two stages of visual processing. Objects are not represented at
the first stage, but, instead, the environment is modeled as containing more primitive
structures called proto-objects. Rensink does not provide a detailed characterization of proto-
objects. He claims that they are relatively complex assemblies of features, for example
involving orientation and colour, or are some arrangements of edges (Rensink, 2000a, pp. 20,
16
23; Rensink, 2000b, pp. 1473). In addition, they are not “pixels”, but rather more complex
locations are involved in their structure (Rensink, 2000a, pp. 22). Such a description suggests
that proto-objects can be identified with the feature bundles postulated in FIT (see (10)) or
with some spatial arrangements of feature-bundles:
(14) An entity is a proto-object iff (1) it is identical to a feature-bundle, or (2) is
identical to some proto-objects standing in appropriate spatial relations to each other.
The identity of constituents and the relations between them provides the identity
criterion for proto-objects:
(15) Proto-objects are different iff they are constituted by different entities or their
constituents stand in different spatial relations.
The above remarks suggest that CTA concerns the subsequent level of perceptual
processing than that described by FIT. However, there are some inconsistencies between
those models regarding the difference in description of representational content. Within CTA
it is clear that objects are not identical with proto-objects – this model assumes that only one
object may be represented at a time, but proto-objects may be represented in great numbers at
every moment (Rensink, 2000a, pp. 23). However, in some works describing FIT it is claimed
that representing assemblies of features is equal to representing “object tokens” (Treisman,
1996, pp. 172). Other psychological models concerning the stages of perceptual process at
which objects start to be represented seem to support CTA in this respect (e.g. Kahneman et
al., 1992; Pylyshyn, 2007; Raftopoulos, 2009). Beside various differences, however, they all
agree that represented objects are not just some combinations of features and locations.
In the subsequent paragraphs, I use the term ‘object’ in the manner proposed in CTA.
Of course, the choice of terminology is to some degree conventional, and in some sense
feature-bundles may be called objects, for example, because they are individuals, individuated
by locations. But what is really important is that, according to CTA, at the medium level of
perceptual processing new entities are represented, which are related to feature-bundles but
are not identical to them. Those new entities possess properties (and in this respect differ from
feature-bundles, which do not possess properties but are combinations of features and
locations), have parts, and are able to persist through change. I will refer to such entities as
‘objects’.
4.2 Proto-objects and Objects
17
As it was stated above, in CTA objects are not identical to proto-objects, and they are
also not identical with the features and locations that constitute them. Because of this, new
types of entities are represented at the level of perceptual processing described in CTA,
(compare this to (13)):
(16) Every entity is a feature, or is a location, or is a proto-object, or is an object. No
entity can belong to more than one entity-type.
However, objects are hardly unrelated to proto-objects. First of all, objects possess
properties, like size, shape, or color (Rensink, 2000a, pp. 23), and there are certain rules that
govern which properties are possessed by an object. In principle, every location and every
feature may be a property of an object. However, if an object possesses a property then this
property is associated with a proto-object (Rensink, 2000a, pp. 24). What is more, an object
does not take properties associated with any proto-object, but only from some subset of all
proto-objects represented at a given time. So it seems that there is a relation which connects
objects with one or few proto-objects. Additionally, these claims are justified by the
description of how objects are represented. A nexus – structure that represents a single object
(Rensink, 2000a, pp. 25) – is “linked” with one or more lower-level structures that represent
proto-objects (Rensink, 2000a: 23; Rensink, 2000b, pp. 1473). This “link” transfers
information from lower-level structures to the nexus allowing it to represent an object as
possessing properties (Rensink, 2000b, pp. 24). This suggests that for every represented
object there are some proto-objects that are represented as standing in a special relation with
that object. In CTA, and also in other models of mid-level vision, it is claimed that objects do
not have to possess all of the properties associated with proto-objects that are related to them.
Additionally it is suggested that objects possessing only location-based properties may be
represented (Rensink, 2000a, pp. 36-37; Rensink, 2000b, pp. 1476). In addition, in various
situations properties of objects may be represented in a sketchier or in a more detailed way
(e.g. Kahneman, Treisman, Gibbs, 1992, pp. 178). Sometimes it is even stated that the visual
system starts by representing objects without representing that they possess any properties
(Raftopoulos, 2009, pp. 92). What is more, according to CTA, objects have an internal
structure, composed of parts that are proto-objects (Rensink, 2000a, pp. 23-24). In that case
proto-objects play a double role: they supplement objects with properties and also build part-
structures of objects.
This, quite complicated, picture may be made clearer by distinguishing, one after
another, all of the relations that occur between objects and proto-objects. Firstly, there is a
relation between every object and one or few proto-objects that is represented by the “link”
18
between the nexus and lower-level structures (I will call it the L-relation). Such a relation
determines the necessary and sufficient conditions for being an object – it is an entity that
stands in L-relation to some proto-objects:
(17) Something is an object iff it stands in L-relation to at least one proto-object.
Relying on descriptions of CTA, it is hard to specify in detail the formal properties of
the L-link. It should not be treated as a transitive relation, as this would lead to the conclusion
that proto-objects may be L-related to other proto-objects and that distinct objects may be L-
related to each other.
Secondly, an object can be related to features and locations and in virtue of this
relation these features and locations are properties of this object. It seems plausible that
objects may be also characterized by more complex properties that are not identical to
features and locations, but are connected with structures of certain proto-objects (“structural
properties”). It is less clear, however, whether at the medium-level of visual processing there
is a structural property associated with every proto-object, or if some structural properties can
be represented only at the higher-levels.
The above relationship is asymmetric and intransitiveobjects do not characterize
themselves, objects do not characterize their properties, and two properties of a single object
do not characterize each other. Its formal features, and function of connecting properties with
objects, makes it a perceptual counterpart “instantiation” or “characterization” relation that is
known from metaphysical theories of objects (e.g., Lowe, 2006, pp. 22). In philosophical
literature a so-called “third man” problem is often associated with instantiation. It seems that
if an object instantiates redness, then the redness also, and quite trivially, instantiates redness.
This entails a regress because there should be another redness that is instantiated both by the
object and by the first redness in virtue of which the object and the first redness are red.
Cognitive models of vision like FIT or CTA omit such problems simply by characterizing
features as atomic elements, which are represented from the beginning of the perceptual
process and do not need any explanation for what they are. It is a primitive fact that a feature
like redness is red, and there is no other visually represented object or relation in virtue of
which this is the case.
As was noted earlier, the L-relation determines which properties may characterize an
object:
(18) If an object is characterized by a property (a feature, a location, or a structural
property), then there is a proto-object that is L-related to this object and this proto-object is
19
constituted by the property in question (in the case of features and locations) or is associated
with the considered property in virtue of its structure (in the case of structural properties).
However, the implication in the opposite direction does not hold, as objects do not
have to possess all the properties associated with proto-objects that are L-related to them.
Thirdly, the proto-objects may constitute the part-structures of objects. The
asymmetrical and transitive relation of proper parthood, described earlier (see (2)), may be
used once more to characterize the connection between proto-objects and objects. Although it
is not specified in descriptions of CTA, it seems plausible to suppose that, similarly as in the
case of characterization, the proto-object is a proper part of an object only if it is also L-
related to this object. Nevertheless, standing in the L-relation is not a sufficient condition of
being a proper part. However, having a proto-object as a proper part can reasonably be treated
as a sufficient condition for being characterized by an appropriate structural property. If this is
the case, the following principles may be introduced:
(19) If a proto-object is a proper part of an object, then this proto-object is L-related
to this object.
(20) If a proto-object is a proper part of an object, then this object is characterized by
a structural property associated with this proto-object (if for the considered proto-object
there is such a property).
In sum, objects are not identical to proto-objects, but every object is L-related to at
least one proto-object. Objects may be characterized by properties associated with proto-
objects that are L-related to them. In addition, proto-objects that are L-related to an object
may serve as its proper parts.
As mentioned earlier, the claim that there are objects of content that are not reducible
to a combination of features and locations is not exclusive to the CTA. In fact, it is a
characteristic feature of models that explain the way in which a visual system represents
persistence. Other such models, like object-file theory (Kahneman et al., 1992) or Pylyshyn’s
FINST model (Pylyshyn, 2007) postulate representational vehicles that represent numerically
distinct objects that can possess features but are not reducible to them. Again, as in the case of
FIT, this shows that the ontology connected with the CTA is a part of a more general picture
within the contemporary cognitive psychology and that it does not strongly depend on the
correctness of all empirical details of the CTA.
4.3 Persisting Objects
20
Up to this point objects and proto-objects have been considered only from a “static”
perspective. In the next few paragraphs, I will investigate the dynamic aspects of these entities
as they are connected with persistence through change. In addition, those considerations will
allow me to address the more general issue of the individuation of objects. By ‘identity’ I
mean the equivalence relation that satisfies a version of the Leibniz Law:
(21) If an entity x is identical to an entity y, then for each time T, something is true
about x at T iff it is true about y at T.
It should be noted that the above principle allows that entities existing at different
times may be identical even if they do not possess the same properties or do not have the
same constituents, etc.
According to CTA, proto-objects are volatile (Rensink, 2000a, pp. 20), as every
change in the constituents of a proto-object leads to a replacement of this proto-object with a
new one. In addition, even the structural sameness does not guarantee the identity of proto-
objects in diachronic contexts. It is so, because if one proto-object ceases to exist and a
structurally same proto-object appears at a later moment, these proto-objects are not
represented as identical. It seems that to be identical proto-objects need to have the same
structure and to be temporally continuous. This strict identity criterion can be expressed by
the following rule:
(22) A proto-object x is identical to a proto-object y iff there exists a finite ordered
series, the elements of which are proto-objects existing at moments of time such that:
(1) x is the first element of the series and y is the last element of the series, and
(2) if an element of the series exists at time T, then the next element exists at T or at
subsequent moment T+1, and
(3) every element of the series is composed of the same features, the same locations,
and the same proto-objects, related in the same way.
Two proto-objects satisfy (22) only if they are connected with a chain of proto-objects
that do not differ in any respect. This thesis accommodates the idea that every change in
proto-objects breaks their identity. The above rule also allows us to determine the identity of
proto-objects which exist at the same moment of time. In that case the series can be described
as containing only two proto-objects that exist at the same moment.
In CTA it is postulated that represented objects, as opposed to proto-objects, are able
to persist through change (Rensink, 2000a, pp. 20). In presentations of CTA, it is only
vaguely claimed that objects may change properties and still remain the same. Fortunately, the
psychological literature is rich in examples regarding the identity of objects (e.g., Pylyshyn,
21
2007, Scholl, 2007). One of the most important paradigms in the investigation of the way in
which the visual system represents persistence is connected with Multiple Object Tracking
experiment (MOT). In the standard version of MOT participants are presented with a set of
uniform objects (e.g., black circles). Some of them (usually 4 to 6) are distinguished as targets
– they might, for example, be indicated by blinking a few times – and others serve as
distractors. Subsequently, all objects start to move in a random manner and the task of
participants is to the track targets. After some time the objects stop moving and the
participants are asked to identify the targets. Usually, the success rate is quite high if the
number of targets does not exceed six.
The MOT experiment has been conducted in various versions and its results allow for
the formulation of some rules that govern the identity of objects. First of all, objects are
identified as being the same so long as they change their position in a way that preserves
spatial continuity and coherence (Scholl, 2007, similar claims can be also found in literature
concerning developmental psychology, see Xu, Carey, 1996). The spatial continuity is not
sustained if, for example, an object disappears and a new object, which may be qualitatively
the same in respect of properties other than location, appears in a different place. The
condition concerning spatial coherence is not fulfilled if, for example, an object divides into
several items (Mitroff, Scholl, Wynn, 2004). This is probably also the case when several
objects merge into one. In such cases, identity is not preserved through change. Properties
other than location seem to be less important, and their changes do not affect identity
(Pylyshyn, 2007, pp. 37).
It should be noted that the above principles describe typical cases rather than designate
necessary and sufficient conditions for the determination of the identity of objects. For this
reason, characterizing rules that govern the diachronic identity seems difficult, as these rules
may be very complex; for example, they may consist of a huge list of principles, each
describing some very specific conditions. However, by using a notion of proto-objects a more
general and simple rule may be proposed. At each time each object is L-related to at least one
proto-object. Every change that object may undergo is connected with a change regarding its
properties, or a change regarding its parts, or a change regarding proto-objects that are L-
related to it. Because an object possesses properties only if they are associated with proto-
objects that are L-related to it, every change of properties is related to a change in proto-
objects L-related to an object. The identity of objects is sustained when proto-objects that are
L-related to them stand in what we might call the ‘continuity and coherence relation’ or CC-
relation (reflexive, symmetric and intransitive). For example, object-1 existing at T-1 is
22
identical with object-2 existing at T-2 when: (a) a location constituting proto-object L-related
to object-1 and a location constituting proto-object L-related to object-2 overlap, and (b) a
proto-object L-related to object-2 starts to exist at T-2, and (c) there is no other proto-objects
that start to exist at T2, the location of which overlaps with the location of proto-object L-
related to object-1, and which is not L-related to object-2. It is quite plausible that in some
situations the CC-relation may be described by using the above (a), (b), and (c) conditions,
but in other cases different sets of rules will be needed. Because of that a full definition of the
CC-relation would probably consist of a huge range of different sets of conditions suited for
describing continuity and coherence in different empirical cases.
However, without going into specific detail concerning the various changes that
objects may undergo or into the measure of continuity and coherence that can accommodate
all those changes, it can be generally stated that:
(23) An object x is identical to an object y iff there exists a finite ordered series the
elements of which are objects existing at moments of time such that:
(1) x is the first element of the series and y is the last element of the series, and
(2) if an element of the series exists at time T, then the next element exists at T or at
the subsequent moment T+1, and
(3) every two subsequent elements of the series are L-related to proto-objects that
stand in the CC-relation.
If the CC-relation is able to determine the identity relation between objects, then it
cannot relate one proto-object to two or more different proto-objects. In other cases, and
because of the transitivity of identity, objects that exist at the same time but are L-related to
significantly different proto-objects (for example containing non-overlapping locations) may
be identified as being identical if their proto-objects are CC-related to a single proto-object
that existed earlier. More specifically, it should be assumed that:
(24) If a proto-object exists at moment T, then: (1) it is CC-related to at most one
proto-object existing at the subsequent moment T+1, and (2) it is CC-related to only one
object existing at T – to itself.
It is worth noting that identical objects existing at different moments may be L-related
to completely different proto-objects, and so may not have any common properties and proper
parts. What is important is that there is a chain of CC-related proto-objects that are L-related
to the considered objects. Analogously, there may be non-identical objects that exist at
different moments and which are L-related to proto-objects that are constituted by exactly the
same features and the same locations. Rather as in the case of proto-objects (see (22)), the
23
rules (23) and (24) may be used to determine the identity of objects existing at a single
moment. They entail that the objects existing at T are identical iff they are L-related to the
same proto-objects. This is because for any object at moment T, the proto-objects L-related to
it stand in CC-relations to themselves. It is interesting to note that according to the CTA-
ontology objects are not necessarily individuated by their properties, because being L-related
to a proto-object does not constitute a sufficient reason for being characterized by properties
associated with this proto-object (see (18)).
It should be noted that the visual system does not need to store information about past
patterns of the CC-relation between proto-objects in order to identify objects in accordance
with (23). It is sufficient to represent relations occurring between subsequent moments.
4.4 The Ontology of CTA – Summary and Philosophical Connections
The objects of mid-level vision seem to be rather peculiar entities. They are not
identical with proto-objects but are L-related to them, they may possess properties but are not
identical to some collections of them, they may have proper parts but again they are not
identical to the sums of their proper parts. What is more, objects existing at different moments
may be identical without sharing any properties or being related to proto-objects that share
some constituents. However, the identity of objects is grounded in chains of continuous and
coherent proto-objects.
The ontology of content derived from CTA clearly shows that objects, contrary to
bundle theories, are not identical to some relational combinations of features, and so Russell’s
proposal does not seem to adequately describe the content of medium-level visual
representations. The ontological model suggested by CTA has properties that make it closer to
another classical analytic notion of objects: substratum theory. According to substratum
theories, objects cannot be analyzed only in terms of features and their connecting relations.
Rather, the ontological structure of objects is also constituted by an unqualitative substratum
(Casullo, 1982). Usually, this additional element has two functions: it serves as a subject of
properties and it individuates objects. If the visual content ontology derived from CTA is
correct, then at the mid-level of visual processing, the content ontology in some respects
satisfies the postulates of substratum theories. Firstly, objects are not identical to
combinations of properties, and secondly they serve as subjects for properties. However,
against standard versions of substratum theories, the identity of objects is not a primitive fact,
but rather relies on the continuity and coherence relations between proto-objects.
24
Because, as was noted earlier, models of early vision, like FIT, characterize visual
objects in terms of combinations of features and locations, while models of persistence, like
CTA, describe them as irreducible, numerically different individuals, we face an interesting
ontological transition. This change is not restricted to the particular examples of FIT and the
CTA, but rather designates a distinction between two classes of scientific models of vision
that differ in their descriptions of visual objects. It is tempting to postulate that the distinction
between two types of visual objectsfeature-bundles and irreducible individualsis
correlated with earlier and later stages of the perceptual process. Complex objects of content
represented by the earliest representations may be adequately described by conceptual tools
proposed in philosophical bundle-theories of objects, but when we start to represent
persistence the ontology of visual content includes objects that are more similar to those
postulated in substratum theories.
5. Conclusions
The main goal of this paper was to sketch an ontological model of visual content at the
low- and medium-levels of visual processing. I argued that scientific conceptions of vision
contain assumptions concerning the objects of content, i.e. objects whose presence is a
necessary condition of visual representations’ adequacy, and that we can identify an
ontological model of content upon which they rely. The investigations were based on the
Feature Integration Theory and the Coherence Theory of Attention. However, the proposed
model is not limited to the specific postulates of these two theories but in important parts is
coherent with assumptions generally held in models of early visual representations and
models of visual persistence.
According to the model I have presented, at the low-level stages of perceptual
processing, locations and features, which are the most basic objects of content, combine into
represented feature-bundles. These ontological structures seem to be similar to those
postulated by analytic bundle theories, and they constitute a type of visual object that is
typically postulated in scientific models of early vision.
At the subsequent levels of perceptual processing, feature-bundles are combined into
proto-objects and a new class of entitiescontaining objectsenters the picture. Objects, due
to their relation with proto-objects, possess features and have proper parts. They are not
identical to features or their combinations, and are able to persist through change. The
ontology of medium-level vision inspired by the CTA is, in important respects, similar to the
25
ontology proposed by substratum theories. Specifically, in both cases objects are not reducible
to features; they serve as subjects of properties; and they do not have to be individuated by
properties (their identity is, however, determined by relations between proto-objects).
The ontological investigation concerning the CTA suggests that the existence of a
second type of a visual object, which cannot be analyzed in terms of combining features and
locations and which is typically postulated in models of visual persistence. Relying on this
observation a hypothesis may be proposed that the visual content undergoes an important
ontological transition in the course of the perpetual process. Bundle-like objects of content
that are represented by the earliest representations are supplemented by irreducible particulars
when processes connected with representing persistence start to be activated.
Acknowledgements
The work was supported by the National Science Center (Poland) under grant
2012/05/N/HS1/03408.
References
Allaire, E. B. (1963). Bare Particulars. Philosophical Studies, 12(1/2), 1-8.
Bayne, T. (2009). Perception and Reach of Phenomenal Content. The Philosophical
Quarterly, 59(236), 385-404.
Casullo, A. (1982). Particulars, Substrata, and the Identity of Indiscernibles. Philosophy of
Science, 49(4), 591-603.
Clark, A. (2004). Feature-placing and proto-objects. Philosophical Psychology, 17(4), 443-
469.
Coates, P. (2007). The Metaphysics of Perception. New York and London: Routledge.
Cohen, J. (2004). Objects, Places, and Perception. Philosophical Psychology, 17(4), 471-495.
Demirli, S. (2010). Indiscernibility and Bundles in Structures. Philosophical Studies, 151(1),
1-18.
Ehring, D. (2001). Temporal Parts and Bundle Theory, Philosophical Studies, 104(2), 163-
168.
Hubel, D., Wiesel, T. N. (1962). Receptive Fields, Binocular Interaction and Functional
Architecture in Cat’s Visual Cortex, Journal of Physiology, 160, 106-154.
26
Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The Reviewing of Object Files: Object-
Specific Integration of Information. Cognitive Psychology, 24, 175-219.
Lowe E. J. (2006). The Four-Category Ontology: A Metaphysical Foundation for Natural
Sciences, Clarendon Press: Oxford.
Marr D. (2010), Vision: A Computational Investigation into the Human Representation and
Processing of Visual Information, Cambridge, MA: MIT Press.
Matthen, M. (2004). Features, Places, and Things: Reflections on Austen Clarks’s Theory of
Sentience. Philosophical Psychology, 17(4), 497-518.
Mitroff, S. R., Scholl, B. J., Wynn, K. (2004). Divide and Conquer. How Object Files Adapt
when a Persisting Object Splits into Two. Psychological Science, 15(6), 420-425.
Muller, F. A., Saunders, S. (2008). Discerning Fermions. The British Journal for the
Philosophy of Science, 59(3), 499-548.
O’Callaghan, C. (2008), Object Perception: Vision and Audition. Philosophy Compass,
3/4, 803-829.
Pylyshyn, Z. W. (2007). Things and Places. How the Mind Connects with the World,
Cambridge, MA: MIT Press.
Raftopoulos, A. (2009). Cognition and Perception. How Do Psychology and Neural Science
Inform Philosophy, Cambridge, MA: MIT Press.
Rensink, R. A. (2000a). The Dynamic Representation of Scenes. Visual Cognition, 7(1/2/3),
17-42.
Rensink, R. A. (2000b). Seeing, Sensing, Scrutinizing. Vision Research, 40(10-12), 1469-
1487.
Roskies, A. L. (1999), The Binding Problem. Neuron, 24, 7-9.
Russell, B. (1956). An Inquiry Into Meaning and Truth. London: George Allen and Unwin
Ltd.
Russell, B. (2009). Humane Knowledge: Its Scope and Limits. London and New York:
Routledge.
Schellenberg S. (2011). Perceptual Content Defended. Nous, 45(4), 714-750.
Schillen, T. B., & König, P. (1994). Binding by Temporal Strucutre in Multiple Feature
Domains of an Oscillatory Network. Biological Cybernetics, 70, 397-405.
Scholl, B. J. (2007). Object Persistence in Philosophy and Psychology. Mind and Language,
22(5), 563-591.
Siegel, S. (2010). Do Visual Experiences Have Contents?. In B. Nanay (Ed.), Perceiving the
World (pp. 333-368). Oxford University Press.
27
Treisman, A. (1982). Perceptual Grouping and Attention in Visual Search for Features and for
Objects. Journal of Experimental Psychology. Human Perception and Performance, 8(2),
194–214.
Treisman, A. (1996). The Binding Problem. Current opinion in neurobiology, 6(2), 171–178.
Treisman, A. (1998). Feature Binding, Attention and Object Perception. Philosophical
Transactions of the Royal Society B: Biological Sciences, 353, 1295-1306.
Treisman, A. (1999). Solutions to the Binding Problem: Progress through Controversy and
Convergence. Neuron, 24(1), 105–110.
Treisman, A., & Gelade, G. (1980). A Feature-Integration Theory of Attention. Cognitive
Psychology, 12, 97-136.
Treisman, A., & Sato, S. (1990). Conjunction Search Revisited. Journal of Experimental
Psychology. Human Perception and Performance, 16(3), 459–478.
Tye, M. (2000). Consciousness, Color, and Content. Cambridge, MA: MIT Press.
Xu, G., Carey, S. (1996). Infant’s Metaphysics: The Case of Numerical Identity. Cognitive
Psychology, 30, 111-156.