Ontology of Early Visual Content

27
1 For the final version, suitable for citation, see: http://www.tandfonline.com/eprint/vF9QqrR56BhmxArSnPck/full Ontology of Early Visual Content Błażej Skrzypulec 1 The main goal of the paper is to sketch an ontological model of visual content at the low- and medium-level of visual processing, relying on psychological conceptions of vision. It is argued that influential cognitive models contain assumptions concerning objects of content”, i.e. objects whose presence is a necessary condition of visual representations’ adequacy. Subsequently, the structure of considered objects of content is presented and it is described how it develops through the perceptual process. In addition, during the course of the article I present some of the connections between analytic metaphysics and the ontology of visual content. Key words: Content; Ontology; Vision; Representations; Metaphysics The way in which our visual perception presents reality seems to suggest an ontology of the world as composed of objects located in three-dimensional space, which possess various properties and can be distinguished as falling into different kinds. Similarly, it is commonly claimed within contemporary philosophy of perception that visual representations present the world as being a certain way (e.g., Siegel, 2010; Schellenberg, 2011). In other words, the content of visual representations specifies objects that have to be present in a visual field if these representations are to be adequate, and so determines a certain “visual ontology”. We may ask for a more precise description of such a “visual ontology” and seek to discover how it is developed through the various stages of the perceptual process. My goal in this article is to provide a sketch of such an ontology, relying on the assumptions of influential scientific models of vision. Because of that I do not restrict the class of visual representations to conscious perceptual experiences (on which philosophers often focus in discussions regarding content), but I consider contents of visual representation that are postulated by cognitive, psychological models. I argue that such models contain assumptions concerning objects whose presence is a necessary condition of visual representations’ adequacy. The 1 Błażej Skrzypulec is a PhD student at Jagiellonian University. Correspondence to: Błażej Skrzypulec, Instytut Filozofii UJ, ul. Grodzka 52, 31-044 Kraków, Poland. Email: [email protected].

Transcript of Ontology of Early Visual Content

1

For the final version, suitable for citation, see:

http://www.tandfonline.com/eprint/vF9QqrR56BhmxArSnPck/full

Ontology of Early Visual Content

Błażej Skrzypulec1

The main goal of the paper is to sketch an ontological model of visual content at the low- and

medium-level of visual processing, relying on psychological conceptions of vision. It is

argued that influential cognitive models contain assumptions concerning “objects of

content”, i.e. objects whose presence is a necessary condition of visual representations’

adequacy. Subsequently, the structure of considered objects of content is presented and it is

described how it develops through the perceptual process. In addition, during the course of

the article I present some of the connections between analytic metaphysics and the ontology

of visual content.

Key words: Content; Ontology; Vision; Representations; Metaphysics

The way in which our visual perception presents reality seems to suggest an ontology

of the world as composed of objects located in three-dimensional space, which possess

various properties and can be distinguished as falling into different kinds. Similarly, it is

commonly claimed within contemporary philosophy of perception that visual representations

present the world as being a certain way (e.g., Siegel, 2010; Schellenberg, 2011). In other

words, the content of visual representations specifies objects that have to be present in a

visual field if these representations are to be adequate, and so determines a certain “visual

ontology”. We may ask for a more precise description of such a “visual ontology” and seek to

discover how it is developed through the various stages of the perceptual process. My goal in

this article is to provide a sketch of such an ontology, relying on the assumptions of influential

scientific models of vision. Because of that I do not restrict the class of visual representations

to conscious perceptual experiences (on which philosophers often focus in discussions

regarding content), but I consider contents of visual representation that are postulated by

cognitive, psychological models. I argue that such models contain assumptions concerning

objects whose presence is a necessary condition of visual representations’ adequacy. The

1 Błażej Skrzypulec is a PhD student at Jagiellonian University. Correspondence to: Błażej Skrzypulec, Instytut

Filozofii UJ, ul. Grodzka 52, 31-044 Kraków, Poland. Email: [email protected].

2

analysis of these assumptions allows us to answer the question of how, according to scientific

models of vision, would the world have to be for the representations these models postulate to

be adequate.

I believe that there are at least two reasons why we should be interested in answering

the above question. First, there exist philosophical controversies concerning the ontology of

visual content. For example, Russell applied a version of the bundle theory of objects in the

context of vision by stating that visual objects can be analyzed as certain combinations of

various visual qualities, with special positional qualities among them (Russell, 1956; 2009).

Opposed to this view, defenders of “bare substratum” theories claimed that in seeing two

objects that are qualitatively the same we are visually acquainted not only with features and

locations but also with irreducible particulars (Allaire, 1963). More recently, Austen Clark

(2004) argued that to present an adequate account of visual content a notion of “visual

particulars” is needed, as a mere description in terms of visual features cannot distinguish

situations in which the same features are combined differently. Various authors have proposed

alternative views on the nature of such “visual particulars” (e.g., Cohen, 2004) and have

discussed the usefulness of this notion for explaining visual phenomena, for example

connected with representing the persistence through change (O’Callaghan, 2008). The

postulates of scientific models of vision play an important role in these debates and the

detailed explication of their assumptions concerning a visual ontology can provide arguments

for and against certain philosophical positions.

Second, as is suggested by our common experience, the content of visual

representations is not chaotic, but is organized by certain stable rules. For example, intuitively

we may think that if a hue is represented then it is also represented as spatially located. Such

rules, if they are true, are satisfied by every visual state, no matter what stimuli are received

by the perceptual system. Metaphorically speaking, they constitute a “form” that organizes the

content of all visual representations (Matthen, 2004, pp. 500). In other words, such rules

characterize an ontology that is “implemented” in the mechanism of the perpetual system: no

matter what arrangement of stimuli is received by the visual system, the resulting

representational content will satisfy those principles. Rules constituting the “implemented”

ontology of the visual system may be revealed by investigating the ontological assumptions of

scientific models of vision.

Within this paper I investigate what visual ontology is postulated in psychological

models that describe visual representations. I start by introducing a general notion of

representation and argue that scientific models of vision contain assumptions regarding the

3

ontology of representational content. I claim that these assumptions characterize “objects of

content”, i.e. objects whose presence is a necessary condition of visual representations’

adequacy. Subsequently, I describe the main statements of Feature Integration Theory and the

Coherence Theory of Attention, and talk about why these models are an interesting choice for

the presented case study. In the main part of the paper I characterize the ontology of visual

content suggested by the models I consider, and describe its development through the low-

and mid-levels of vision. In addition, during the course of the article I present some of the

connections between analytic metaphysics and the ontology of visual content here discussed.

1. Representations and Objects of Content

Of course one may doubt whether cognitive models of vision contain any ontological

assumptions at all. It should be clear that what is relevant here is not the attitudes of authors of

scientific models, for example whether they are interested in such ontological assumptions or

whether they deliberately rely on some philosophical conception while formulating their

theories. The important question is about the structure of scientific models: whether they in

fact contain ontological assumptions concerning visual content. Below, I argue for the thesis

that formulating a scientific model of visual representations is closely connected to making

postulates about “objects of content”, i.e. objects whose presence is a necessary condition of

visual representations’ adequacy.

Both classical theory of representations developed by Charles S. Peirce, as well as

modern conceptions formulated in the context of cognitive science (Marr 2010) propose,

using different terminology, three elements that are crucial for the general notion of

representation. The first we can call the “representational vehicle”. In the most general terms,

the representational vehicle is something in virtue of which an element of the world is

represented. Written words or a series of sounds used in speech are probably the most

intuitive examples of such structures. In David Marr’s conception, the word ‘representation’

refers mainly to the representational vehicle, which is a formal system, composed of simple

symbols that may be arranged within more complex ones according to certain rules (Marr

2010, pp. 20).

The second element is the “object of denotation”. Representations stand for

something, like verbal descriptions may stand for people or pictures may stand for landscapes,

and represent it correctly or not. These “objects of denotation”, i.e. objects for which

representations stand, should be understood very broadly, for example, in the case of

4

representational vehicles like written words, objects of denotation may be peoples, states of

affairs, abstract entities, etc. In the context of visual perception the objects of denotation can

be generally understood as fragments of the environment that in some way interfere with light

coming to retinas, mainly by producing, reflecting, or absorbing it.

From the above remarks we know that representational vehicles, or representations in

a narrow sense, refer to some objects of denotation, and that these objects may be represented

adequately or not. The role of the third element, “representational content” (in Marr’s

terminology “a description of an entity in a representation” (Marr 2010, pp. 20)), is to specify

the conditions of representations’ adequacy. Representational content determines what has to

be satisfied by an object of denotation if a given representation is to be adequate. Using

another common example we may imagine that a representational vehicle is an

inscriptionsay, “This is a tree”. In such a case, intuitively, representational content

determines that if the representation is adequate, then the object of denotation is a tree. In the

case of visual representations, representational content specifies objects whose presence in the

visual field is the necessary condition of visual representations’ adequacy. These objects may

be named “objects of content”.

Relying on the general notion of representation we may ask which of the three

elementsrepresentational vehicle, object of denotation, or representational contentare

characterized by statements included in cognitive models of vision. Descriptions of visual

representations proposed in such models are often composed of statements like: “A is

represented by B”. For example, in Feature Integration Theory (FIT) it is claimed that at the

beginning of the perpetual process features are represented by activities in certain areas of the

brain called “feature-maps” (e.g. Treisman, 1998, p. 1295). To show that cognitive models of

vision contain ontological statements concerning representational content it is enough to argue

that at least in some cases their statements characterize objects of content.

Of course, the main goal of scientific models is to explain how the visual system

fulfils a certain function by describing a mechanism in virtue of which the considered

function is realized. There are different levels of description that can be used. However, in all

cases describing a mechanism is equal to describing a certain representational vehicle and its

operations. Although characterizing the objects of content is not the main goal of cognitive

models, it is difficult to obtain a proper description of a mechanism without having any

characterization of content. In the case of visual representation, the most relevant mechanisms

are responsible for representing some aspects of the environment. To propose an adequate

description of a mechanism we need to specify what the representational abilities of a

5

perceptual system are at a given stage of the perceptual process. We have to characterize what

can be represented at a given stagefor example, as in the FIT model, that the visual system

at the beginning of the perceptual process is able to represent the presence of features, but not

their combinations.

In fact, a specification of representational content is a part of “computational theory”

as described by Marr (2010, pp. 23). Such a theory contains, inter alia, rules that characterize

what can be represented by cognitive a system. For example, according to Marr’s postulates

(Marr 2010, pp. 37), at the earliest levels of visual processing the cognitive system is able to

represent discontinuities between certain features (like levels of luminance), but does not

represent regions uniformly “filled-in” by features. Later, these local discontinuities are

represented as composing edges, but only if they are represented as being in a certain spatial

arrangement. The main goal of a cognitive model is then to describe a mechanism responsible

for such representational abilities.

A description of the representational abilities of a visual system does not refer to, or at

least not only to, objects of denotation of visual representations. This is so because

representational abilities of a visual system are different at subsequent stages of the perceptual

process, while objects of denotation, which are sources of visual stimuli, stay the same. What

is more, visual representations can be inadequate, so the same objects may be represented no

matter whether proper objects of denotation are present within the visual field. Because of

this, objects characterized in descriptions of representational abilities seem to be objects of

content, i.e. objects whose presence is necessary for representations’ adequacy rather than

objects that in fact causally interfere with visual system, and such descriptions entail a certain

visual ontology.

The above argument shows that a characteristic of objects of content is usually needed

in order to satisfy the main goal of cognitive models of vision, i.e. formulating a description

of a mechanism that is responsible for certain representational abilities of a visual system.

While arguing for the presence of ontological assumptions in cognitive models I

frequently referred to the notion of representational adequacy. However, it is important to

note that the project of analyzing the ontological postulates concerning objects of content does

not presuppose any particular conception of a visual representations’ adequacy. This general

project is consistent with a strong claim that such representations (or more specifically their

representational vehicles) are composed of language-like structures that are literally true or

false about the environment, as well as with alternative statements according to which visual

representations are picture-like mental models that can achieve a certain mapping with

6

external entities. In fact it is sufficient to adopt an “instrumentalist” account of representation

in which representations are theoretical entities postulated in cognitive models to construct

successful explanations, without assuming any claims about the real existence of mental

models somehow produced by the brain. The only assumption that is needed for

investigations concerning visual ontology is that cognitive models describe representations as

entities that may model the environment correctly or incorrectly and that their correctness

entails the presence of certain objects.

It should be noted that giving the characteristics of objects of content may not exhaust

the adequacy conditions of representations. For example, not only must a certain feature be

present, it also has to be causally related to a visual system in some appropriate way (see

Coates, 2007, pp. 57). However, in the following article I am concerned only with the

narrower notion of representational content that specifies entities (and relations between

them) whose presence is a necessary condition of visual representations’ adequacy, without

claiming that these necessary conditions are also sufficient.

2. Models and Ontology

Below, I present brief descriptions of the models we shall consider, namely: Feature

Integration Theory (FIT) and Coherence Theory of Attention (CTA). There are several

reasons for choosing FIT and the CTA rather than analyzing other models or adopting a more

general approach that does not focus on any particular conceptions. First, providing a detailed

analysis of chosen models helps to prove that there are interesting ontological claims

regarding content that can be inferred from cognitive models of vision. Restricting our scope

to two models allows us to refer, given the limited space available here, to particular claims

presented in the considered models. In particular by looking at FIT, which is a classical point

of reference in discussions of early levels of the visual process, it will be shown that

ontological assumptions are present in a well-known and influential model.

Second, these two models are representative of a wider class of models: those which

characterize visual objects as constructions out of features and locations (in case of FIT) and

those which describe visual objects as persisting individuals (in case of the CTA). Even if

these particular models are not entirely accurate, as it is often assumed about the classical

version of FIT, their main ontological assumptions are shared among many others important

conceptions. In subsequent parts of the paper I point out the elements that FIT and the CTA

share with other models, for example with Marr’s classical conception and Pylyshyn’s FINST

7

theory. In addition, I do not assume that analysis of FIT and CTA provide a full description of

the objects of content through the perceptual process. Analyzing other models can reveal

additional objects of content, for example categorized objects represented at higher stages of

visual processing.

The third and the most important point is that the choice of FIT and the CTA has a

particular philosophical significance. These two models, which characterize subsequent

phases of the perceptual process, can be used to demonstrate an ontological change regarding

visual content that separates earlier and later visual representations. As will be shown,

according to FIT, objects of content are bundles of features and locations; but in case of CTA

they are described as irreducible, numerically different individuals that possess some features.

This difference allows us to formulate a hypothesis that ontologically different objects are

represented at different stages of the visual process. Such a result has direct consequences for

philosophical debates concerning visual content, in which it is discussed whether visual

objects can be adequately described by using a version of a bundle theory of objects, and how

their structure is connected with ability to persist through change.

Feature Integration Theory, originally presented by Treisman and Gelade (1980), is

one of the best-known models, which concerns relatively low-levels of the visual process. Its

main goal is to explain how visual information, initially computed in different brain areas, is

combined in a way that allows for the perception of objects that have many features and for

the ability to distinguish between those objects that share some features. The question of how

cognitive mechanisms connect various pieces of information about represented objects is

generally known as the “binding problem”. In the context of visual perception, the binding

problem is usually connected with the ability to represent proper combinations of features and

localizations of objects. The fact that the binding problem is considered within models of

early vision is already important from the point of view of ontological investigations about

visual content, as it shows that even low-level visual representations represent some

combinations of features. However, cognitive models of vision usually treat visual binding as

a part of the representational vehiclea process that connects information about the visible

environment that were earlier computed separately (see Roskies 1999 for a short review). The

most influential hypotheses regarding the nature of such mechanism involve attentional top-

down influences, hierarchical structures of specialized cells, or synchronization of the

activities of different neuron groups (e.g., Schillen, König, 1994).

Despite this, there is also a sense of visual binding that concerns representational

content. The binding within the representational vehicle allows for representing objects that

8

are bound together (for example, different features like color and shape). Because of this, we

may ask an ontological question about the rules that govern such binding and about the formal

properties of this represented relation. For example, one may ask, if is it the case that if two

features are represented as bound together, then they are represented as bound with a common

location, or whether the relation of binding is transitive. Later I call this relation of binding

between objects of content the B-relation, and I investigate what can be inferred about it from

postulates of FIT.

The scope of the second model – Coherence Theory of Attention (Rensink, 2000a) –

partially overlaps with the scope of FIT. According to this model, at the early levels of visual

processing, “proto-objects” are represented. These are assemblies of features similar to those

described in FIT (Rensink 2000a, pp. 22). According to CTA, proto-objects are volatile: as

they are replaced with every change they have a limited ability to persist through time

(Rensink 2000a, pp. 20). However, more stable, persisting objects are also represented at the

level of perceptual processing described by CTA. Every object is related to some proto-

objects and this relation allows objects to possess properties and to have parts (Rensink

2000a, pp. 24). The description of the representational vehicle in CTA is more abstract than in

FIT and does not refer to concrete brain structures. The most important part of the vehicle is

the “nexus”: a mental structure that is created when an attentional mechanism gathers

information from elements that represent proto-objects (Rensink 2000b, pp. 1473). Generally,

a nexus represents that there is an individual object in the environment and also that it has

properties. These more complex representations rely on information from simpler

representations that represent proto-objects. What is more, a nexus is to some degree capable

of storing information, which allows for the representation of the persistence of objects.

The descriptions of representational content proposed in the above models do more

than just characterize the ontological adequacy conditions of some particular representations.

– they determine general rules that govern visual content. These rules, such as “every feature

is related to a location”, are satisfied in every visual state that can be produced at a considered

stage of perceptual processing. Because of this, an ontological model of visual content

composed of them designates a class of all possible arrangements of objects of content that

may be represented at a given level. Analyzing such rules allows us to investigate the

ontology “implemented” in the perceptual mechanism that is the same for all perceptual states

at a given level of perceptual processing. In the further sections of this paper I analyze what

“implemented” ontology of content is postulated within the models we have been considering.

9

3. Ontology of FIT

According to the usual presentations of Feature Integration Theory, it describes two

stages of the perceptual process. At the first stage features and locations are represented while

at the second they are additionally represented as related in a certain way. The vehicle of

representation is composed of feature-maps and the master map of locations – neural

structures whose various elements and activities represent different locations and features.

The relations between features and locations are represented by binding elements of feature

maps with elements of the location-map, which is often understood in terms of the

synchronizing activities between groups of neurons.

3.1 Features and Locations

Features serve as atomic elements of the content ontology of FIT. They are not built of

simpler entities, but serve rather as a “material” that constitutes more complex structures.

According to FIT, features are divided into dimensions (Treisman, 1982, pp. 197; Treisman,

Gelade, 1980, pp. 98). In most descriptions of FIT, this division is made by referring to

intuitions and by giving examples of different dimensions. Color, orientation, and simple

shapes (like shapes of different letters) are presented as belonging to different dimensions of

features.

One might doubt whether these examples, especially those concerning shapes, present

good candidates for atomic ontological elements. However, if the examples presented in FIT-

related papers are accurate, we can draw two conclusions that are not explicitly stated in the

presentations of FIT. Firstly, it seems that features from the same dimension cannot be

simultaneously related to the same location – if a color, shape, or orientation is related to a

location at a given time, then no other color, shape, or orientation is related to that location at

that time. Secondly, dimensions such as color and dimensions such as orientation or shape

seem to contain two different types of features. Features like color can be related to any

location, but this is not true of features such as orientation and shape. For example, if a

location can be related to the feature of squareness, then it cannot be related to the

triangularity-feature (in principle, not only temporally simultaneously). In a similar fashion, a

location that is not elongated (e.g. a circular location) cannot be related to any orientation-

feature. Features that can be related only to some locations can be named “emergent features”,

as they seem to supervene on structures of locations.

10

According to FIT, locations constitute a second type of represented entities, whose

vehicle of representation is the “master-map of locations”. According to FIT, locations are not

represented as relations between features, but in an absolute way, as fragments of a 3-

dimensional space (Treisman, 1982, pp. 198). It is not entirely clear whether visual systems at

early level represent atomic, point-like locations, or more complex locations which have parts

and may be treated as sums of atomic locations. It is quite plausible that at least some

complex locations may be represented. Many emergent features must be related to complex

locations if they are to be related to anything. What is more, it is claimed that the visual

system represents regions (Treisman, 1998, pp. 1296), which suggests complex rather than

atomic locations. To distinguish atomic locations from complex locations, a notion of the

asymmetric and transitive proper parthood relation may be useful. As opposed to complex

location, atomic locations do not have proper parts. This property distinguishes location from

features (as characterized in FIT): locations may have proper parts, but all features are atomic.

It may be asked whether locations and features should be treated as separate types of

entities, or whether locations constitute one of the feature-dimensions. Some statements

suggest that locations constitute a special type of feature (“locations differ from other

features”; Treisman, 1982, pp. 197). Below I do not make any strong statement concerning

the relation between the location-type of entities and the feature-type of entities. I will speak

about locations as forming a separate type that is distinct from features, mainly because it

allows me to present the material more clearly.

From the above we may formulate some basic ontological characteristics of the

features and locations postulated in FIT model:

(1) Every entity is a feature or a location. No entity is both a feature and a location.

(2) Proper parthood is an asymmetric and transitive relation. An entity x stands in

proper parthood relation to y iff x is a proper part of y.

(3) If something is a feature, then it is an atomic entity – it has no proper parts.

(4) If something is a feature, then it belongs to one and only one feature-dimension.

(5) If something is a location, then it is atomic or complex.

(6) A location is atomic iff it does not have any locations as its proper parts.

(7) A location is complex iff it has at least one location as its proper part.

(8) If two features belong to the same dimension, then they cannot be simultaneously

related to the same location. 2

2 Later in the paper, the relation between features and locations is characterized in a more detailed way, under the

name “B-relation”.

11

(9) A feature is an emergent feature iff there are locations to which it cannot be

related.

According to FIT, locations and features belonging to various dimensions are basic

elements constituting visual content and are represented from the very beginning of the

perceptual process. This assumption is expressed not only within FIT but is widely shared in

other influential models of early visual processing. For example, in classical investigations

concerning the physiology of early vision conducted by Hubel and Wiesel, it is postulated that

the activities of sub-cortical cells on visual pathway and cortical cells in V1 layer represent

certain localized features, like local differences in luminance (Hubel, Wiesel 1962). Because

of this, the claim presenting features and locations as basic elements of content does not

depend strongly on accepting FIT as a wholly adequate conception of early vision, but rather

expresses a common ontological description present in different scientific models.

3.2 Feature-Bundles

According to FIT, the main goal of the early visual system is to properly relate

features to locations. I postulated above that distinct types of features differ in their abilities to

be related to locations (emergent features may be related only to certain locations) and that

features from a single feature-dimension cannot be simultaneously related to the same

location. Now it is a good time to explicate more precisely what ontological characteristics are

connected with this relation.

Unfortunately, the descriptions of FIT do not contain much information about the link

between locations and features. It is clear that the relation between features and locations is

represented by some sort of bond between elements of feature maps and the master map of

locations (e.g., Treisman, 1996, pp. 172). What is more, the same operation binds not only

elements of feature maps with elements of the location map, but also connects elements of

different feature maps, if they are bound with the same element of the map of locations

(Treisman, Gelade, 1980, pp. 100). This suggests that the visual system represents a single

relation that is able to connect features with locations as well as different features. I call this

relation the B-relation, as it is represented by some type of neural binding at the level of

representational vehicle.

It seems that a single feature may be simultaneously B-related to multiple locations. In

addition, as is explicitly stated in the characteristics of FIT (e.g. Treisman, 1998, pp. 1296), a

single location can be also simultaneously B-related to more than one feature. However,

12

features and locations are not represented as B-related in random configurations. The goal of

the visual system is to represent that some features are B-related to a single location, and in

virtue of that are also B-related with each other (Treisman, Gelade, 1980, pp. 100). Such

structures, according to FIT (Treisman, 1996, pp. 172; Treisman, Gelade, 1980, pp. 98), serve

as the most basic representations of objects and are individuated by the fact that each one

contains a unique location. These complex entities may be called feature-bundles and can be

defined in the following way:

(10) An entity is a feature-bundle iff it is (1) constituted by exactly one location, and

(2) is constituted by all features that are B-related to this location, and (3) all its constituents

are B-related to each other.

The above definition entails that feature-bundles are individuated by locations.

Obviously, if feature bundles are constituted by different locations, then these feature-bundles

are different. This is because every feature-bundle is only constituted by a single location. It is

also the case that if feature-bundles are different, then they are constituted by different

locations. Feature-bundles that were different, but were constituted by the same location,

would have to differ in the respective features or patterns of the B-relations. However, if

feature-bundles contain the same location, then they also have to contain the same features, as

feature-bundles contain all features B-related with their locations. Feature-bundles also cannot

differ in patterns of B-relations, for the simple reason that all elements of every feature-bundle

are B-related to each other. Because of this, the sameness of locations is a sufficient condition

for sameness of feature-bundles.

The above facts provide the identity criterion for feature-bundles:

(11) Feature-bundles are different iff they are constituted by different locations.

What are the formal characteristics of the B-relation? As I stated earlier, the

descriptions of FIT do not give an explicit answer. One might have the sense that the relation

between features and locations should be characterized as the asymmetric relation of

“localization”. However, this is unlikely to be true in this context, where the same relation

may also hold between various features. Because of that the relation should be rather

characterized as symmetric. What about transitivity? Treating the B-relation as transitive

would entail some unwanted consequences. Firstly, it would lead to the relation of features

from the same dimension to a single location. Secondly, it would lead to the relation of

locations to each other – but such a possibility is never mentioned is descriptions of FIT. For

example, let’s assume that there is a bundle1 constituted by red, squareness, location1, and that

there is a bundle2 constituted by red, triangularity, location2. If the considered relation were

13

transitive, we would get a structure that is constituted by all those entities: location1, location2,

red, squareness, and triangularity. This would not allow for the representation of the

difference between a situation in which location1 is connected with redness and squareness,

and location2 with redness and triangularity from the opposite case in which location1 is with

redness and triangularity, and location2 with redness and squareness.

For these reasons, the relations between features and locations should rather be

characterized as intransitive:

(12) B-relation is a symmetric and intransitive relation that connects a feature with a

location or a feature with a feature.

Every account of the binding relation should be able to resolve the so-called “Many

Properties” problem. It is commonly observed that the content of visual states cannot be

adequately described only in term of represented features, since this does not allow for

distinguishing perceptual states that are clearly different. This can be demonstrated by the

following illustration. There can be two perceptual states: (I) a first, in which a green square

object is represented in location1 and a red square object is represented in location2, (II) and a

second in which the situation is reversed: a red square object is represented in location1 and a

green square object in loacation2. These two states cannot be distinguished just by listing the

represented basic elements, as the list in both cases will be the same, containing: redness,

greenness, squareness, location1, and location2. To resolve the many properties problem a

relation has to be introduced that determines which elements compose more complex objects.

As was noted by Austen Clark (2004, pp. 449), this role cannot be fulfilled by logical

conjunction. The main reason for this is the transitivity of conjunction. As I stated earlier, if

the B-relation was transitive, then inappropriate binding between various locations and

features belonging to the same dimension would occur. In fact, if the B-relation was

conjunction then in state (I) as well as in state (II) all elements would be conjoined. Because

of this, these two states cannot be discerned by using conjunction as a binding relation.

The B-relation I propose is intransitive and symmetric. Having these formal properties

it is more similar to relations like “compresence” or “coinstantiation” that are proposed in

metaphysical bundle theories of objects. Despite various differences between versions of the

bundle theory, the general claim is that entities constituting an object stand to each other in a

symmetric but intransitive relation (e.g., Demirli, 2010; Ehring, 2001). By using such a

relation, together with the definition of a feature-bundle (see (10)), it is possible to resolve the

“Many Properties” problem and discerns states like (I) and (II). In the first state there are

exactly two feature bundles: one composed of greenness, squareness, and location1, and a

14

second composed of redness, squareness, and location2. The second state also contains two

bundles, but their arrangement of elements is different to that in state (I). What is more,

because location plays the role of individuator in a structure of a feature-bundle (see (10) and

(11)) we get a criterion for deciding how many qualitatively same feature-bundles are

represented within a single perceptual state.

Overall, the occurrence of binding extends the visual ontology (compare to (1)):

(13) Every entity is a feature, or is a location, or is a feature-bundle (constituted by

certain features and a location). No entity can belong to more than one entity-type.

Early visual objects described in other models of vision are also characterized as

variants of feature-bundles postulated in FIT, i.e. they are basically certain combinations of

features and locations, or more complex structures composed of feature-bundles. One

example of an influential conception is David Marr’s classical theory, in which visual

primitives are, inter alia, local discontinuities, edges, or bars (Marr 2010, pp. 37). All of these

are structures composed of locations connected with features, where the most important

elements are spatially connected locations that are “filled-in” by incompatible features (like

different levels of luminance) and so designate borders between surfaces. Similarly, in the

important model of early vision proposed by Rock and Palmer, simple objects of content are

characterized as “uniform regions” that are maximal, spatially coherent locations all of whose

parts are “filled-in” by the same feature (Rock, Palmer 1994). Again this shows that the full

adequacy of FIT is not a necessary condition for accepting feature-bundles as objects of

content.

3.3 Ontology of FIT – Summary and Philosophical Connections

Relying on the above observations, we can see that according to FIT model, visual

systems at the very first level of the perceptual processing represent the environment as

containing features and locations. Features are atomic elements and are divided into

dimensions, while locations may be atomic or complex. The goal of the early vision is to

represent features and locations as standing in the B-relation. Features and locations are

organized into feature-bundles that are composed of exactly one location and one or more

features that are B-related to each other.

The notion of represented feature-bundles is similar to the analysis of objects given by

bundle theories formulated on the grounds of analytic metaphysics. Among various versions

of the bundle theory, that proposed by Russell (2009, pp. 78) seems to be the most similar to

15

the description of feature-bundles suggested by FIT. Russell claimed that a set of features (or

“qualities” in Russell’s terminology) forms a bundle iff all those features are compresent

(symmetric and intransitive relation) to each other and there is no other feature that is

compresent with all the features that constitute that set, but which do not belong to the set.

Russell applied his conception to the content of the visual field stating that there are

features, like redness, that are compresent with positional features that are different

localizations within visual field (Russell, 1956, pp. 337; Russell, 2009, pp. 230). According to

this proposition, the visual field is organized into bundles that are individuated by positional

features. This characteristic almost completely agrees with the ontology derived from FIT (see

(10)), and is also similar to the claims of other models that describe objects of content as

combinations of features and locations. However, Russell’s conception was devised to

describe the final product of the visual system whereas FIT only describes early visual

representations. In the next sections we shall see if the ontology of medium-level visual

content also agrees with the postulates of bundle theories.

4. The Ontology of CTA

Rensink’s Coherence Theory of Attention (CTA) describes the “medium level” of

visual processing. At this stage the environment starts to be represented as containing objects

that possess properties, have an internal part-structure, and are able to persist through change.

According to Rensink (2000b, pp. 1473), this is achieved by an attentional mechanism which

gathers information concerning various features in the environment and creates a “nexus” – a

mental structure that represents an individual object possessing properties. In what follows, I

begin by tracing the connection between FIT and CTA. I then describe the ontology of

medium level content on the basis of Rensink’s model as well as conceptions concerning

visual representations of persistence (e.g., Pylyshyn, 2007).

4.1 Proto-objects and Feature-Bundles

The CTA distinguishes two stages of visual processing. Objects are not represented at

the first stage, but, instead, the environment is modeled as containing more primitive

structures called proto-objects. Rensink does not provide a detailed characterization of proto-

objects. He claims that they are relatively complex assemblies of features, for example

involving orientation and colour, or are some arrangements of edges (Rensink, 2000a, pp. 20,

16

23; Rensink, 2000b, pp. 1473). In addition, they are not “pixels”, but rather more complex

locations are involved in their structure (Rensink, 2000a, pp. 22). Such a description suggests

that proto-objects can be identified with the feature bundles postulated in FIT (see (10)) or

with some spatial arrangements of feature-bundles:

(14) An entity is a proto-object iff (1) it is identical to a feature-bundle, or (2) is

identical to some proto-objects standing in appropriate spatial relations to each other.

The identity of constituents and the relations between them provides the identity

criterion for proto-objects:

(15) Proto-objects are different iff they are constituted by different entities or their

constituents stand in different spatial relations.

The above remarks suggest that CTA concerns the subsequent level of perceptual

processing than that described by FIT. However, there are some inconsistencies between

those models regarding the difference in description of representational content. Within CTA

it is clear that objects are not identical with proto-objects – this model assumes that only one

object may be represented at a time, but proto-objects may be represented in great numbers at

every moment (Rensink, 2000a, pp. 23). However, in some works describing FIT it is claimed

that representing assemblies of features is equal to representing “object tokens” (Treisman,

1996, pp. 172). Other psychological models concerning the stages of perceptual process at

which objects start to be represented seem to support CTA in this respect (e.g. Kahneman et

al., 1992; Pylyshyn, 2007; Raftopoulos, 2009). Beside various differences, however, they all

agree that represented objects are not just some combinations of features and locations.

In the subsequent paragraphs, I use the term ‘object’ in the manner proposed in CTA.

Of course, the choice of terminology is to some degree conventional, and in some sense

feature-bundles may be called objects, for example, because they are individuals, individuated

by locations. But what is really important is that, according to CTA, at the medium level of

perceptual processing new entities are represented, which are related to feature-bundles but

are not identical to them. Those new entities possess properties (and in this respect differ from

feature-bundles, which do not possess properties but are combinations of features and

locations), have parts, and are able to persist through change. I will refer to such entities as

‘objects’.

4.2 Proto-objects and Objects

17

As it was stated above, in CTA objects are not identical to proto-objects, and they are

also not identical with the features and locations that constitute them. Because of this, new

types of entities are represented at the level of perceptual processing described in CTA,

(compare this to (13)):

(16) Every entity is a feature, or is a location, or is a proto-object, or is an object. No

entity can belong to more than one entity-type.

However, objects are hardly unrelated to proto-objects. First of all, objects possess

properties, like size, shape, or color (Rensink, 2000a, pp. 23), and there are certain rules that

govern which properties are possessed by an object. In principle, every location and every

feature may be a property of an object. However, if an object possesses a property then this

property is associated with a proto-object (Rensink, 2000a, pp. 24). What is more, an object

does not take properties associated with any proto-object, but only from some subset of all

proto-objects represented at a given time. So it seems that there is a relation which connects

objects with one or few proto-objects. Additionally, these claims are justified by the

description of how objects are represented. A nexus – structure that represents a single object

(Rensink, 2000a, pp. 25) – is “linked” with one or more lower-level structures that represent

proto-objects (Rensink, 2000a: 23; Rensink, 2000b, pp. 1473). This “link” transfers

information from lower-level structures to the nexus allowing it to represent an object as

possessing properties (Rensink, 2000b, pp. 24). This suggests that for every represented

object there are some proto-objects that are represented as standing in a special relation with

that object. In CTA, and also in other models of mid-level vision, it is claimed that objects do

not have to possess all of the properties associated with proto-objects that are related to them.

Additionally it is suggested that objects possessing only location-based properties may be

represented (Rensink, 2000a, pp. 36-37; Rensink, 2000b, pp. 1476). In addition, in various

situations properties of objects may be represented in a sketchier or in a more detailed way

(e.g. Kahneman, Treisman, Gibbs, 1992, pp. 178). Sometimes it is even stated that the visual

system starts by representing objects without representing that they possess any properties

(Raftopoulos, 2009, pp. 92). What is more, according to CTA, objects have an internal

structure, composed of parts that are proto-objects (Rensink, 2000a, pp. 23-24). In that case

proto-objects play a double role: they supplement objects with properties and also build part-

structures of objects.

This, quite complicated, picture may be made clearer by distinguishing, one after

another, all of the relations that occur between objects and proto-objects. Firstly, there is a

relation between every object and one or few proto-objects that is represented by the “link”

18

between the nexus and lower-level structures (I will call it the L-relation). Such a relation

determines the necessary and sufficient conditions for being an object – it is an entity that

stands in L-relation to some proto-objects:

(17) Something is an object iff it stands in L-relation to at least one proto-object.

Relying on descriptions of CTA, it is hard to specify in detail the formal properties of

the L-link. It should not be treated as a transitive relation, as this would lead to the conclusion

that proto-objects may be L-related to other proto-objects and that distinct objects may be L-

related to each other.

Secondly, an object can be related to features and locations and in virtue of this

relation these features and locations are properties of this object. It seems plausible that

objects may be also characterized by more complex properties that are not identical to

features and locations, but are connected with structures of certain proto-objects (“structural

properties”). It is less clear, however, whether at the medium-level of visual processing there

is a structural property associated with every proto-object, or if some structural properties can

be represented only at the higher-levels.

The above relationship is asymmetric and intransitiveobjects do not characterize

themselves, objects do not characterize their properties, and two properties of a single object

do not characterize each other. Its formal features, and function of connecting properties with

objects, makes it a perceptual counterpart “instantiation” or “characterization” relation that is

known from metaphysical theories of objects (e.g., Lowe, 2006, pp. 22). In philosophical

literature a so-called “third man” problem is often associated with instantiation. It seems that

if an object instantiates redness, then the redness also, and quite trivially, instantiates redness.

This entails a regress because there should be another redness that is instantiated both by the

object and by the first redness in virtue of which the object and the first redness are red.

Cognitive models of vision like FIT or CTA omit such problems simply by characterizing

features as atomic elements, which are represented from the beginning of the perceptual

process and do not need any explanation for what they are. It is a primitive fact that a feature

like redness is red, and there is no other visually represented object or relation in virtue of

which this is the case.

As was noted earlier, the L-relation determines which properties may characterize an

object:

(18) If an object is characterized by a property (a feature, a location, or a structural

property), then there is a proto-object that is L-related to this object and this proto-object is

19

constituted by the property in question (in the case of features and locations) or is associated

with the considered property in virtue of its structure (in the case of structural properties).

However, the implication in the opposite direction does not hold, as objects do not

have to possess all the properties associated with proto-objects that are L-related to them.

Thirdly, the proto-objects may constitute the part-structures of objects. The

asymmetrical and transitive relation of proper parthood, described earlier (see (2)), may be

used once more to characterize the connection between proto-objects and objects. Although it

is not specified in descriptions of CTA, it seems plausible to suppose that, similarly as in the

case of characterization, the proto-object is a proper part of an object only if it is also L-

related to this object. Nevertheless, standing in the L-relation is not a sufficient condition of

being a proper part. However, having a proto-object as a proper part can reasonably be treated

as a sufficient condition for being characterized by an appropriate structural property. If this is

the case, the following principles may be introduced:

(19) If a proto-object is a proper part of an object, then this proto-object is L-related

to this object.

(20) If a proto-object is a proper part of an object, then this object is characterized by

a structural property associated with this proto-object (if for the considered proto-object

there is such a property).

In sum, objects are not identical to proto-objects, but every object is L-related to at

least one proto-object. Objects may be characterized by properties associated with proto-

objects that are L-related to them. In addition, proto-objects that are L-related to an object

may serve as its proper parts.

As mentioned earlier, the claim that there are objects of content that are not reducible

to a combination of features and locations is not exclusive to the CTA. In fact, it is a

characteristic feature of models that explain the way in which a visual system represents

persistence. Other such models, like object-file theory (Kahneman et al., 1992) or Pylyshyn’s

FINST model (Pylyshyn, 2007) postulate representational vehicles that represent numerically

distinct objects that can possess features but are not reducible to them. Again, as in the case of

FIT, this shows that the ontology connected with the CTA is a part of a more general picture

within the contemporary cognitive psychology and that it does not strongly depend on the

correctness of all empirical details of the CTA.

4.3 Persisting Objects

20

Up to this point objects and proto-objects have been considered only from a “static”

perspective. In the next few paragraphs, I will investigate the dynamic aspects of these entities

as they are connected with persistence through change. In addition, those considerations will

allow me to address the more general issue of the individuation of objects. By ‘identity’ I

mean the equivalence relation that satisfies a version of the Leibniz Law:

(21) If an entity x is identical to an entity y, then for each time T, something is true

about x at T iff it is true about y at T.

It should be noted that the above principle allows that entities existing at different

times may be identical even if they do not possess the same properties or do not have the

same constituents, etc.

According to CTA, proto-objects are volatile (Rensink, 2000a, pp. 20), as every

change in the constituents of a proto-object leads to a replacement of this proto-object with a

new one. In addition, even the structural sameness does not guarantee the identity of proto-

objects in diachronic contexts. It is so, because if one proto-object ceases to exist and a

structurally same proto-object appears at a later moment, these proto-objects are not

represented as identical. It seems that to be identical proto-objects need to have the same

structure and to be temporally continuous. This strict identity criterion can be expressed by

the following rule:

(22) A proto-object x is identical to a proto-object y iff there exists a finite ordered

series, the elements of which are proto-objects existing at moments of time such that:

(1) x is the first element of the series and y is the last element of the series, and

(2) if an element of the series exists at time T, then the next element exists at T or at

subsequent moment T+1, and

(3) every element of the series is composed of the same features, the same locations,

and the same proto-objects, related in the same way.

Two proto-objects satisfy (22) only if they are connected with a chain of proto-objects

that do not differ in any respect. This thesis accommodates the idea that every change in

proto-objects breaks their identity. The above rule also allows us to determine the identity of

proto-objects which exist at the same moment of time. In that case the series can be described

as containing only two proto-objects that exist at the same moment.

In CTA it is postulated that represented objects, as opposed to proto-objects, are able

to persist through change (Rensink, 2000a, pp. 20). In presentations of CTA, it is only

vaguely claimed that objects may change properties and still remain the same. Fortunately, the

psychological literature is rich in examples regarding the identity of objects (e.g., Pylyshyn,

21

2007, Scholl, 2007). One of the most important paradigms in the investigation of the way in

which the visual system represents persistence is connected with Multiple Object Tracking

experiment (MOT). In the standard version of MOT participants are presented with a set of

uniform objects (e.g., black circles). Some of them (usually 4 to 6) are distinguished as targets

– they might, for example, be indicated by blinking a few times – and others serve as

distractors. Subsequently, all objects start to move in a random manner and the task of

participants is to the track targets. After some time the objects stop moving and the

participants are asked to identify the targets. Usually, the success rate is quite high if the

number of targets does not exceed six.

The MOT experiment has been conducted in various versions and its results allow for

the formulation of some rules that govern the identity of objects. First of all, objects are

identified as being the same so long as they change their position in a way that preserves

spatial continuity and coherence (Scholl, 2007, similar claims can be also found in literature

concerning developmental psychology, see Xu, Carey, 1996). The spatial continuity is not

sustained if, for example, an object disappears and a new object, which may be qualitatively

the same in respect of properties other than location, appears in a different place. The

condition concerning spatial coherence is not fulfilled if, for example, an object divides into

several items (Mitroff, Scholl, Wynn, 2004). This is probably also the case when several

objects merge into one. In such cases, identity is not preserved through change. Properties

other than location seem to be less important, and their changes do not affect identity

(Pylyshyn, 2007, pp. 37).

It should be noted that the above principles describe typical cases rather than designate

necessary and sufficient conditions for the determination of the identity of objects. For this

reason, characterizing rules that govern the diachronic identity seems difficult, as these rules

may be very complex; for example, they may consist of a huge list of principles, each

describing some very specific conditions. However, by using a notion of proto-objects a more

general and simple rule may be proposed. At each time each object is L-related to at least one

proto-object. Every change that object may undergo is connected with a change regarding its

properties, or a change regarding its parts, or a change regarding proto-objects that are L-

related to it. Because an object possesses properties only if they are associated with proto-

objects that are L-related to it, every change of properties is related to a change in proto-

objects L-related to an object. The identity of objects is sustained when proto-objects that are

L-related to them stand in what we might call the ‘continuity and coherence relation’ or CC-

relation (reflexive, symmetric and intransitive). For example, object-1 existing at T-1 is

22

identical with object-2 existing at T-2 when: (a) a location constituting proto-object L-related

to object-1 and a location constituting proto-object L-related to object-2 overlap, and (b) a

proto-object L-related to object-2 starts to exist at T-2, and (c) there is no other proto-objects

that start to exist at T2, the location of which overlaps with the location of proto-object L-

related to object-1, and which is not L-related to object-2. It is quite plausible that in some

situations the CC-relation may be described by using the above (a), (b), and (c) conditions,

but in other cases different sets of rules will be needed. Because of that a full definition of the

CC-relation would probably consist of a huge range of different sets of conditions suited for

describing continuity and coherence in different empirical cases.

However, without going into specific detail concerning the various changes that

objects may undergo or into the measure of continuity and coherence that can accommodate

all those changes, it can be generally stated that:

(23) An object x is identical to an object y iff there exists a finite ordered series the

elements of which are objects existing at moments of time such that:

(1) x is the first element of the series and y is the last element of the series, and

(2) if an element of the series exists at time T, then the next element exists at T or at

the subsequent moment T+1, and

(3) every two subsequent elements of the series are L-related to proto-objects that

stand in the CC-relation.

If the CC-relation is able to determine the identity relation between objects, then it

cannot relate one proto-object to two or more different proto-objects. In other cases, and

because of the transitivity of identity, objects that exist at the same time but are L-related to

significantly different proto-objects (for example containing non-overlapping locations) may

be identified as being identical if their proto-objects are CC-related to a single proto-object

that existed earlier. More specifically, it should be assumed that:

(24) If a proto-object exists at moment T, then: (1) it is CC-related to at most one

proto-object existing at the subsequent moment T+1, and (2) it is CC-related to only one

object existing at T – to itself.

It is worth noting that identical objects existing at different moments may be L-related

to completely different proto-objects, and so may not have any common properties and proper

parts. What is important is that there is a chain of CC-related proto-objects that are L-related

to the considered objects. Analogously, there may be non-identical objects that exist at

different moments and which are L-related to proto-objects that are constituted by exactly the

same features and the same locations. Rather as in the case of proto-objects (see (22)), the

23

rules (23) and (24) may be used to determine the identity of objects existing at a single

moment. They entail that the objects existing at T are identical iff they are L-related to the

same proto-objects. This is because for any object at moment T, the proto-objects L-related to

it stand in CC-relations to themselves. It is interesting to note that according to the CTA-

ontology objects are not necessarily individuated by their properties, because being L-related

to a proto-object does not constitute a sufficient reason for being characterized by properties

associated with this proto-object (see (18)).

It should be noted that the visual system does not need to store information about past

patterns of the CC-relation between proto-objects in order to identify objects in accordance

with (23). It is sufficient to represent relations occurring between subsequent moments.

4.4 The Ontology of CTA – Summary and Philosophical Connections

The objects of mid-level vision seem to be rather peculiar entities. They are not

identical with proto-objects but are L-related to them, they may possess properties but are not

identical to some collections of them, they may have proper parts but again they are not

identical to the sums of their proper parts. What is more, objects existing at different moments

may be identical without sharing any properties or being related to proto-objects that share

some constituents. However, the identity of objects is grounded in chains of continuous and

coherent proto-objects.

The ontology of content derived from CTA clearly shows that objects, contrary to

bundle theories, are not identical to some relational combinations of features, and so Russell’s

proposal does not seem to adequately describe the content of medium-level visual

representations. The ontological model suggested by CTA has properties that make it closer to

another classical analytic notion of objects: substratum theory. According to substratum

theories, objects cannot be analyzed only in terms of features and their connecting relations.

Rather, the ontological structure of objects is also constituted by an unqualitative substratum

(Casullo, 1982). Usually, this additional element has two functions: it serves as a subject of

properties and it individuates objects. If the visual content ontology derived from CTA is

correct, then at the mid-level of visual processing, the content ontology in some respects

satisfies the postulates of substratum theories. Firstly, objects are not identical to

combinations of properties, and secondly they serve as subjects for properties. However,

against standard versions of substratum theories, the identity of objects is not a primitive fact,

but rather relies on the continuity and coherence relations between proto-objects.

24

Because, as was noted earlier, models of early vision, like FIT, characterize visual

objects in terms of combinations of features and locations, while models of persistence, like

CTA, describe them as irreducible, numerically different individuals, we face an interesting

ontological transition. This change is not restricted to the particular examples of FIT and the

CTA, but rather designates a distinction between two classes of scientific models of vision

that differ in their descriptions of visual objects. It is tempting to postulate that the distinction

between two types of visual objectsfeature-bundles and irreducible individualsis

correlated with earlier and later stages of the perceptual process. Complex objects of content

represented by the earliest representations may be adequately described by conceptual tools

proposed in philosophical bundle-theories of objects, but when we start to represent

persistence the ontology of visual content includes objects that are more similar to those

postulated in substratum theories.

5. Conclusions

The main goal of this paper was to sketch an ontological model of visual content at the

low- and medium-levels of visual processing. I argued that scientific conceptions of vision

contain assumptions concerning the objects of content, i.e. objects whose presence is a

necessary condition of visual representations’ adequacy, and that we can identify an

ontological model of content upon which they rely. The investigations were based on the

Feature Integration Theory and the Coherence Theory of Attention. However, the proposed

model is not limited to the specific postulates of these two theories but in important parts is

coherent with assumptions generally held in models of early visual representations and

models of visual persistence.

According to the model I have presented, at the low-level stages of perceptual

processing, locations and features, which are the most basic objects of content, combine into

represented feature-bundles. These ontological structures seem to be similar to those

postulated by analytic bundle theories, and they constitute a type of visual object that is

typically postulated in scientific models of early vision.

At the subsequent levels of perceptual processing, feature-bundles are combined into

proto-objects and a new class of entitiescontaining objectsenters the picture. Objects, due

to their relation with proto-objects, possess features and have proper parts. They are not

identical to features or their combinations, and are able to persist through change. The

ontology of medium-level vision inspired by the CTA is, in important respects, similar to the

25

ontology proposed by substratum theories. Specifically, in both cases objects are not reducible

to features; they serve as subjects of properties; and they do not have to be individuated by

properties (their identity is, however, determined by relations between proto-objects).

The ontological investigation concerning the CTA suggests that the existence of a

second type of a visual object, which cannot be analyzed in terms of combining features and

locations and which is typically postulated in models of visual persistence. Relying on this

observation a hypothesis may be proposed that the visual content undergoes an important

ontological transition in the course of the perpetual process. Bundle-like objects of content

that are represented by the earliest representations are supplemented by irreducible particulars

when processes connected with representing persistence start to be activated.

Acknowledgements

The work was supported by the National Science Center (Poland) under grant

2012/05/N/HS1/03408.

References

Allaire, E. B. (1963). Bare Particulars. Philosophical Studies, 12(1/2), 1-8.

Bayne, T. (2009). Perception and Reach of Phenomenal Content. The Philosophical

Quarterly, 59(236), 385-404.

Casullo, A. (1982). Particulars, Substrata, and the Identity of Indiscernibles. Philosophy of

Science, 49(4), 591-603.

Clark, A. (2004). Feature-placing and proto-objects. Philosophical Psychology, 17(4), 443-

469.

Coates, P. (2007). The Metaphysics of Perception. New York and London: Routledge.

Cohen, J. (2004). Objects, Places, and Perception. Philosophical Psychology, 17(4), 471-495.

Demirli, S. (2010). Indiscernibility and Bundles in Structures. Philosophical Studies, 151(1),

1-18.

Ehring, D. (2001). Temporal Parts and Bundle Theory, Philosophical Studies, 104(2), 163-

168.

Hubel, D., Wiesel, T. N. (1962). Receptive Fields, Binocular Interaction and Functional

Architecture in Cat’s Visual Cortex, Journal of Physiology, 160, 106-154.

26

Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The Reviewing of Object Files: Object-

Specific Integration of Information. Cognitive Psychology, 24, 175-219.

Lowe E. J. (2006). The Four-Category Ontology: A Metaphysical Foundation for Natural

Sciences, Clarendon Press: Oxford.

Marr D. (2010), Vision: A Computational Investigation into the Human Representation and

Processing of Visual Information, Cambridge, MA: MIT Press.

Matthen, M. (2004). Features, Places, and Things: Reflections on Austen Clarks’s Theory of

Sentience. Philosophical Psychology, 17(4), 497-518.

Mitroff, S. R., Scholl, B. J., Wynn, K. (2004). Divide and Conquer. How Object Files Adapt

when a Persisting Object Splits into Two. Psychological Science, 15(6), 420-425.

Muller, F. A., Saunders, S. (2008). Discerning Fermions. The British Journal for the

Philosophy of Science, 59(3), 499-548.

O’Callaghan, C. (2008), Object Perception: Vision and Audition. Philosophy Compass,

3/4, 803-829.

Pylyshyn, Z. W. (2007). Things and Places. How the Mind Connects with the World,

Cambridge, MA: MIT Press.

Raftopoulos, A. (2009). Cognition and Perception. How Do Psychology and Neural Science

Inform Philosophy, Cambridge, MA: MIT Press.

Rensink, R. A. (2000a). The Dynamic Representation of Scenes. Visual Cognition, 7(1/2/3),

17-42.

Rensink, R. A. (2000b). Seeing, Sensing, Scrutinizing. Vision Research, 40(10-12), 1469-

1487.

Roskies, A. L. (1999), The Binding Problem. Neuron, 24, 7-9.

Russell, B. (1956). An Inquiry Into Meaning and Truth. London: George Allen and Unwin

Ltd.

Russell, B. (2009). Humane Knowledge: Its Scope and Limits. London and New York:

Routledge.

Schellenberg S. (2011). Perceptual Content Defended. Nous, 45(4), 714-750.

Schillen, T. B., & König, P. (1994). Binding by Temporal Strucutre in Multiple Feature

Domains of an Oscillatory Network. Biological Cybernetics, 70, 397-405.

Scholl, B. J. (2007). Object Persistence in Philosophy and Psychology. Mind and Language,

22(5), 563-591.

Siegel, S. (2010). Do Visual Experiences Have Contents?. In B. Nanay (Ed.), Perceiving the

World (pp. 333-368). Oxford University Press.

27

Treisman, A. (1982). Perceptual Grouping and Attention in Visual Search for Features and for

Objects. Journal of Experimental Psychology. Human Perception and Performance, 8(2),

194–214.

Treisman, A. (1996). The Binding Problem. Current opinion in neurobiology, 6(2), 171–178.

Treisman, A. (1998). Feature Binding, Attention and Object Perception. Philosophical

Transactions of the Royal Society B: Biological Sciences, 353, 1295-1306.

Treisman, A. (1999). Solutions to the Binding Problem: Progress through Controversy and

Convergence. Neuron, 24(1), 105–110.

Treisman, A., & Gelade, G. (1980). A Feature-Integration Theory of Attention. Cognitive

Psychology, 12, 97-136.

Treisman, A., & Sato, S. (1990). Conjunction Search Revisited. Journal of Experimental

Psychology. Human Perception and Performance, 16(3), 459–478.

Tye, M. (2000). Consciousness, Color, and Content. Cambridge, MA: MIT Press.

Xu, G., Carey, S. (1996). Infant’s Metaphysics: The Case of Numerical Identity. Cognitive

Psychology, 30, 111-156.