Target detection in urban clutter

9
1242 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 19, NO. 5. SEPTEMBER/OCTOBER 1989 Target Detection in Urban Clutter J. MICHAEL CATHCART, MEMBER, IEEE, THEODORE J. DOLL, AND DAVID E. SCHMIEDER, MEMBER, IEEE Abstract -Experiments were conducted to determine the influence of urban background clutter on observer target detection performance. Tests consisted of measuring observer ability to find tank targets in synthetic images displayed on a TV monitor. Results were measured in terms of detection probability versus false alarm probability for various spatial resolution and clutter levels. These results were compared to those from previous measurements made with rural clutter. The comparison showed that, contrary to expectations, better performance was obtained in urban clutter. It was postulated that the contextual cues of urban background clutter aided the observer screening process. The need for a better clutter definition and further research to generalize the results to a larger popula- tion of images is discussed. I. INTRODUCTION A previous study [l] was conducted to investigate the effects of clutter and resolution on observer target detection performance. That study was an attempt to determine the influence of clutter on the Johnson bar target equivalency criteria [2]. It found a strong correlation between clutter level, as expressed by the signal-to-clutter ratio (SCR), and detection probability for a given resolution level. The basic clutter configuration was that of rural North American or rural European landscapes. Effects of varying types of clutter configurations, i.e., desert, mountainous, urban, etc., were not investigated. Results from the rural clutter study were used to postulate new detection resolution criteria for various clutter levels. These are expected to be useful in perfor- mance prediction modeling and in understanding requirements for machine vision systems. This correspondence reports the results of experiments conducted to extend the detection criteria to urban clutter configurations. Several issues must be addressed when attempting to evaluate clutter effects on object detection. These include the need for a suitable merit function to characterize clutter magnitude, the ability to manipulate clutter, and the need for a model to describe the observer’s decision process. This investigation used the same clutter merit function defined in the previous rural clutter study. That merit function consisted of amplitude and spatial weighting factors. It was the result of an effort to obtain a simple description of what can be a very complicated structure. The results of this study yield more insight into the complexities of manmade clutter features. Another issue is the ability to generate and vary the clutter type and magnitude. As in the previous study, this investigation used the flexibility of synthetic imagery, produced by computer image generation software, to establish the needed clutter con- trol. Finally, a method is needed to correct raw observer test data for the effects of guessing and false alarms. This was done previously with the aid of the high threshold theory (HTT) of signal detection [3]. That theory allowed the results to be con- verted into a corrected detection probability without an associ- ated nonzero false alarm probability. The latter interpretation allowed results to be compared to the Johnson criteria that did not provide for false alarms. In addition use of the high threshold theory put comparatively lesser demands on the amount of synthetic imagery that had to be generated. This was a practical consideration since computer image generation is a time and Manuscript received June 25. 1988; revised February 24, 1989. This work was supported by Dr. Michael Kruse and Lt. Col. Gordon Hammond of the Air Force Geographics Laboratory, and Lt. Mario Moya of Air Force Wright Aeronautical Laboratories. J. M. Cathcart and D. E. Schmieder are with Georgia Institute of Technol- ogy, GTRI/EML, Atlanta, GA 30332. T. J. Doll is with Georgia Institute of Technology, GTRI/SEL, Atlanta. GA 30332. IEEE Log Number 8928056. TABLE I SCR-RESOLUTION PAIRS Resolution Signal-to- Line-Pairs per Clutter Ratio Target Height 0.33 2.0 0.33 4.0 1.20 1 .0 1.20 2.0 (SW ( Ip/w ) resource intensive process. The present urban clutter study cor- rected for false alarms within the context of signal detection theory (STD) [3]. That theory has the advantage that it better quantifies the detection process uncontaminated by observer response bias. Moreover, results deduced using STD can be related to the HTT equivalent for the purpose of comparing urban clutter results to the previous rural clutter findings. The objective of this investigation was to extend the results of the previous rural clutter study to an urban background setting. Specifically the goal was to examine the combined effects of both resolution and urban clutter on target detection probability by human observers. These results would either support the use of the previous rural resolution criteria or lead to new resolution criteria for target detection in urban clutter configurations. It was hypothesized that the resolution criteria for urban clutter would be more severe than those in rural clutter. Finally, the results of this study were expected to provide insight into the effect of clutter configuration in general on the resolution levels required for target detection. 11. METHOD A. Approach Summury Image generation and infrared (IR) signature modeling soft- ware was used to generate synthetic scenes with controlled amounts of urban clutter, resolution, and target signal intensity. Four human observers each reviewed a large number of such scenes, and indicated their confidence that each did, or did not, contain a specified target. Observer detection data were analyzed using the SDT approach that quantifies target detectability in terms of the observer’s sensitivity. The sensitivity data were converted to probabilities of detection ( Pd) in order to compare the present data with that of the previous study of target detec- tion in rural clutter. B. Image Preparation The experimental procedure was designed such that an ob- server was required to detect the presence of an M 1 tank in a synthetic infrared image. To achieve the necessary stability in the observer data, a significant number of data points were required for each stimulus condition (i.e., an SCR/resolution pair) to be evaluated. Based on previous research [3], [4], [5] and constraints on computational resources, it was determined that a minimum of 300 data points would be required for each stimulus condition. In retrospect it was found that this provided stable estimates of hit and false alarm rates. By using each image twice, the total number of synthetic images required was reduced to 150. Ten images were needed for observer training purposes with the result that the total number of synthetic images required was 160. For the computation of the false alarm probability, the experimental methodology required that half of the images be populated with a single target. The range of SCR/resolution values was desired to correspond to the case where the probability of target detection was approximately 0.5. The stimulus conditions selected for this study are given in Table 1. As with the original rural study, the present study used syn- thetic infrared imagery as the basis for the measurement of the 0018-9472/89/0900-1242$01.00 61989 IEEE

Transcript of Target detection in urban clutter

1242 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 19, NO. 5. SEPTEMBER/OCTOBER 1989

Target Detection in Urban Clutter J . MICHAEL CATHCART, MEMBER, IEEE, THEODORE J. DOLL,

AND DAVID E. SCHMIEDER, MEMBER, IEEE

Abstract -Experiments were conducted to determine the influence of urban background clutter on observer target detection performance. Tests consisted of measuring observer ability to find tank targets in synthetic images displayed on a TV monitor. Results were measured in terms of detection probability versus false alarm probability for various spatial resolution and clutter levels. These results were compared to those from previous measurements made with rural clutter. The comparison showed that, contrary to expectations, better performance was obtained in urban clutter. It was postulated that the contextual cues of urban background clutter aided the observer screening process. The need for a better clutter definition and further research to generalize the results to a larger popula- tion of images is discussed.

I. INTRODUCTION A previous study [l] was conducted to investigate the effects of

clutter and resolution on observer target detection performance. That study was an attempt to determine the influence of clutter on the Johnson bar target equivalency criteria [2]. It found a strong correlation between clutter level, as expressed by the signal-to-clutter ratio (SCR), and detection probability for a given resolution level. The basic clutter configuration was that of rural North American or rural European landscapes. Effects of varying types of clutter configurations, i.e., desert, mountainous, urban, etc., were not investigated. Results from the rural clutter study were used to postulate new detection resolution criteria for various clutter levels. These are expected to be useful in perfor- mance prediction modeling and in understanding requirements for machine vision systems. This correspondence reports the results of experiments conducted to extend the detection criteria to urban clutter configurations.

Several issues must be addressed when attempting to evaluate clutter effects on object detection. These include the need for a suitable merit function to characterize clutter magnitude, the ability to manipulate clutter, and the need for a model to describe the observer’s decision process. This investigation used the same clutter merit function defined in the previous rural clutter study. That merit function consisted of amplitude and spatial weighting factors. It was the result of an effort to obtain a simple description of what can be a very complicated structure. The results of this study yield more insight into the complexities of manmade clutter features.

Another issue is the ability to generate and vary the clutter type and magnitude. As in the previous study, this investigation used the flexibility of synthetic imagery, produced by computer image generation software, to establish the needed clutter con- trol. Finally, a method is needed to correct raw observer test data for the effects of guessing and false alarms. This was done previously with the aid of the high threshold theory (HTT) of signal detection [3] . That theory allowed the results to be con- verted into a corrected detection probability without an associ- ated nonzero false alarm probability. The latter interpretation allowed results to be compared to the Johnson criteria that did not provide for false alarms. In addition use of the high threshold theory put comparatively lesser demands on the amount of synthetic imagery that had to be generated. This was a practical consideration since computer image generation is a time and

Manuscript received June 25. 1988; revised February 24, 1989. This work was supported by Dr. Michael Kruse and Lt. Col. Gordon Hammond of the Air Force Geographics Laboratory, and Lt. Mario Moya of Air Force Wright Aeronautical Laboratories.

J. M. Cathcart and D. E. Schmieder are with Georgia Institute of Technol- ogy, GTRI/EML, Atlanta, GA 30332.

T. J . Doll is with Georgia Institute of Technology, GTRI/SEL, Atlanta. GA 30332.

IEEE Log Number 8928056.

TABLE I SCR-RESOLUTION PAIRS

Resolution Signal- to- Line-Pairs per

Clutter Ratio Target Height

0.33 2.0 0.33 4.0 1.20 1 .0 1.20 2.0

( S W ( Ip/w )

resource intensive process. The present urban clutter study cor- rected for false alarms within the context of signal detection theory (STD) [3] . That theory has the advantage that it better quantifies the detection process uncontaminated by observer response bias. Moreover, results deduced using STD can be related to the HTT equivalent for the purpose of comparing urban clutter results to the previous rural clutter findings.

The objective of this investigation was to extend the results of the previous rural clutter study to an urban background setting. Specifically the goal was to examine the combined effects of both resolution and urban clutter on target detection probability by human observers. These results would either support the use of the previous rural resolution criteria or lead to new resolution criteria for target detection in urban clutter configurations. It was hypothesized that the resolution criteria for urban clutter would be more severe than those in rural clutter. Finally, the results of this study were expected to provide insight into the effect of clutter configuration in general on the resolution levels required for target detection.

11. METHOD

A . Approach Summury Image generation and infrared (IR) signature modeling soft-

ware was used to generate synthetic scenes with controlled amounts of urban clutter, resolution, and target signal intensity. Four human observers each reviewed a large number of such scenes, and indicated their confidence that each did, or did not, contain a specified target. Observer detection data were analyzed using the SDT approach that quantifies target detectability in terms of the observer’s sensitivity. The sensitivity data were converted to probabilities of detection ( Pd) in order to compare the present data with that of the previous study of target detec- tion in rural clutter.

B. Image Preparation The experimental procedure was designed such that an ob-

server was required to detect the presence of an M 1 tank in a synthetic infrared image. To achieve the necessary stability in the observer data, a significant number of data points were required for each stimulus condition (i.e., an SCR/resolution pair) to be evaluated. Based on previous research [3] , [4], [5] and constraints on computational resources, it was determined that a minimum of 300 data points would be required for each stimulus condition. In retrospect it was found that this provided stable estimates of hit and false alarm rates. By using each image twice, the total number of synthetic images required was reduced to 150. Ten images were needed for observer training purposes with the result that the total number of synthetic images required was 160. For the computation of the false alarm probability, the experimental methodology required that half of the images be populated with a single target. The range of SCR/resolution values was desired to correspond to the case where the probability of target detection was approximately 0.5. The stimulus conditions selected for this study are given in Table 1.

As with the original rural study, the present study used syn- thetic infrared imagery as the basis for the measurement of the

0018-9472/89/0900-1242$01.00 61989 IEEE

1243

1244

target was positioned randomly in the scene and oriented such that the distinguishing features of the tank [e.g., gun barrel. exhaust port) were in view

The signal-to-clutter ratio (SCR) was determined after each image was filtered. Filtering of a scene modified the signal, background. and clutter values for a sceae. Since the SCR value could not be predicted pnor to filtering, an iterative procedure was employed to produce images with the proper values of the signal-teclutter ratio for each resolution. The definition of clut- ter employed in the current study was

where the sum is over the total number of square cells in the scene, N . For the current set of images, the cell size was 1.73 times the target height. This provided an integral number of cells across the image. The radiance standard deviation, U,, for a particular cell, was defined by

where 4 , is the intensity value of the j th pixel in the rth cell: p

is thc awragr irrtcnsrtj in the i t l i cull, and X is &e total number o f pixels in the cell

C'alcula~ron of [he signal-to-clutler ratio was performod for each image containing a target hach image was presented on the display as a gr,iy-x:rle ininge. thus. the rndiances for all scene elements sere normdized to 2SS gra)-scale levels qith the divi- sion between Ievcli, dctcrmmed hy tlic difference between the largest and s m a t l e ~ radiance xalues in the image All measure- ments and calculiatmns performed on tbe scenes were made in ternis of this gray scale. instead of the :rctual radiance values. The calculation of the SC'R -2s based of1 the fotloaing fonnula:

1 target irrtcnsity -- background intensitv 1 clutter 3) ~. A :...

The target intensitt w i t , t.ikcn 'is the highest pixel gray-scale \ d u e found on tlx wsihle portion o f ttic tank. The background value mas establi~hed h! averaging wx-cml background pixel calues taken from thc rnirriediate Licimty of the tank. Thig procedure asc;unitcf that thc engine compartment was warmer than the irumediete hackground bul the rank %as not necessarily the warnut object in &e scenc The highest target intensity value esublished a tiigba SCK than the louest target intensity value Fig 2 stiowt, d rcpresentatrtc rillage for the four stiniulus conditions

1235

C. hpercmentaal Prmedure Each of the four obseners war tested mdniduallv The ob-

servers viewed a series of test wne5 presented on a 1)ew:r 8500 high resolution monitor d m m b\ ~t UEC Micro-Vax I1 com- puter Each image appeared for f'ne seconds, during wktch time the observer searched ~ ~ s u a l l v Cor the target At the end of the mewing penal, the obriewr was pronrpted by an audible tone to make a response The response Lonsisted of pressing one of four keys on the computer keyboiud to Indicate his level of confidence (Table 11) that the scene contamed a target.

During the response penod the computer waited five seconds for a response. If the observer responded with d i d response before the five second tme in t end had elapw.% the response uas recorded and the next image displayxf immediatel? If no answr or an invalid answer was gven, the computer would pronipt the observer agam and w a r f i w more ceconds If no answer was given on this second try, the computer nurrked the response for that image as being invalid and moved to the next image

The five second viewing inten al was determined through Infor- mal pilot testmg and the rerults of previoui research on target detection i~ cluttered scene\ [93 Aftcr a €ew practxe tilals, observers reported that five monds pmided ample time to search the image. Lengthening the ohsenation intend too great13 would have degraded perfornxince h~ producing observer bore- dom [ 3 , p. 3961

A procedure was written that contrdled two computer moni- tors to display the image for the correct length of time. prompt the user for a response, record and check it, and wnte the

response to a file. Computer control of the procedure insured uniform tinling and accurate recording of the responses. The computerized procedure limited the experimenter influence on the observer during acquisition of the data.

To insure uniformity in the image contrast between images, the dispkay controls were set to the same positions as those used to derive the Gaussian filter parameters. The image display and an associated computer terminal were set up in a partitioned area in a larger room to minimize external distractions. The display was placed on a table approximately 18 inches from the observer. The room was partially darkened to minimize the glare and rcflectionc 01% the display monitor.

The order of image presentation followed a repeated Latin square experimental design. This method was designed to control for any linear effects related to the order in whch the four stimulus conditions were presented to the observer. For example the observer might become more proficient with practice; hence conditions presented later in the sequence would benefit. Then differences in performance between conditions would reflect the practice effect as well as the inherent qualities ol the test condi- tions. In a similar fashion observer fatigue might affect perfor- mance in successively presented conditions.

Thk III illustrates the image presentation order for the ob- server test sessions. The letters A through D in this table repre- sent the four stimulus conditions to be tested. Each row repre- sents a different ordering of the stimulus conditions, i.e., each row is designated a test sequence. Note that each stimulus condition appears twice in each row, and that the average posi-

1246

tion of each condition in the sequence is the same for all four conditions. This sequence structure for the test conditions caused any linear effect related to the order of presentation (e.g., prac- tice, fatigue) to cancel out wthin each subject's performance.

Each of four observers received a training session of 80 images and two test sequences chosen from the test sequences given in Table III. From the training imagery the observers could see the target was characterized by a warm hull and hot engine cnmpart- ment. Across the four observers, each sequence was run exactly twice. This caused each condition, A through D, to appear equally often in each position This aspect of the Latin square design insured that any effect, linear or otherwise, related to the order of test condition presentation canceled out over observers

Each sequence contamed eight block. ( A , R , C , etc.) of 75 synthetic infrared images. Each block was preceded by a warm-up session. Within a test sequence. images were randomly ordered and were not repeated. The test sessions for observer were separated by at least 24 hours. Additional details on the experi- mental procedure can be found in [lo].

The observers were four students (three male and one female) at the Georgia Institute of Technology, Atlanta, GA, who ranged in age from 14 to 23 years. Observers were tested for Snellen acuity (20/20 corrected or uncorrected reqmred) and the ability to detect the target at minimum contrast. Observers were paid for their participation.

I11 DAIA REDUCTION

A . observer Detection Models The response of a human observer in a detection or discnmina-

tion task reflects not only perceptual processes, but also factors

such as strategy. motication. and expectations Therefore I t i i necessaw to procscs the raw data from a detection eqmiment to separate effecb due to perceptual pimesses from those due to the other factors, which ;xe general15 referred to as reiponse biases In order to achieke the separation. one must have a model of the detection process. L e., a de\cnption of how response biaseh affet the obsercer's responses in a detection task

The classical niodel c j f the human detection procesq assunies that the human observer has a diwrete. fixed threshold for the signal Thir viewpoint, called high-threshold model. is stiIl useful as a model of some types of electronic detection syatems, but does not adequately characteriie the huaran detection process In particular the probabilitj of detection, et. in the high-threshold model is influenced bv facton related to response biases. such as incentives and the relative frequency with shich the signal is presented (as opposed to presentations of no signal or noise only). The bgh-threshold model 15 one of a Fmn13 of tiireshold models, all of which have similar shortcomings [cf. 31

The failure of threbhold-iype models of the human detection process led to the detelopment of signal detection theory (SDT) [4]. SDT makes i t possible to e\timate the observers' perceptual sensitivity (d') separatelb front the response critenon (beta), and alm imposes restrictions on the method of data collection [3] There IS a great deal of evidence that S13T provides a vald charactemation of human detection processes for a uide vatietv of detection tabk> SDT-based metbods a e widely accepted ai the \tandard in psychophyucal research [ 5 ) .

B. Duta Re& tron Procrtluw The data reduction procedure in the present mes&gation

involved trko m a p s t q b First, SDT uas used to estimate

"The ' indicates the wcond prewntatinn of thc given stimu- lus condition.

observers' perceptual sensitivities { d's) uncontaminated by re- sponse biases for each of' the background condi5ons. Second, the t i ' s were converted to equivalent P,, vdues in the high-threshold model, assunling a constant response criterion. The omversion made it possible to compare the present results with those of the previous study of target detectability in mral clutter (I].

The first part of the data andysis contlucted in accordance with the procedure crutlined by Green and Swets 13, pp. 99-1031

for rating-scale experiments. This involved calculating the pro- portion of hits and false alarms for each observer in each of the four viewing conditions and for each of three levels of response confidence. These calculations defined three points on each of 16 receiver operating cliaracteristics (ROC'S). 21s shown in Fig. 3, The three poilits on each ROC represent the hit and fake alarm proportions for the three le

The hit and false alarm proportions for each of the 16 combi- nations of observer and viewing condition were then converted to standard normal scores ( 2 scores}. The equal-variance Gaussian model, was used to calculate i f ' s for dl conditions. The formula for d' is as fo-nllows [3, p. 4051:

of response confidence.

The :(yes/,% ) is the 2 score correspoiiding to the proportion of false alarms and zoes/S) is the z scorc corresponding to the proportion of hits The proportion of hits and false dams for the intermediate level of response confidence were wed in calculat- ing d' for each obwver/'sIcswing condition combination The sniuoth curves in Fig 3 are the ROC's for each c o i t d h n based on the tl' values derived from (1) and the equal-vari:tnce, Gauhs- ran ilssutnptions

In order to make possible a comparison o€ these results with those of the preuous study of target detection in rural clutter, d' >.aluea were converted to probability of detection ( 4, ) values in high-threshold theory The contersion was done by taking as the high-threshold ROC that line that intersects the SDT equal-kari- mce. Gauwan ROC at the point where the response criterion. beta. is 1 tl (at the negative diagonal of the ROC plot) Y, is the I' intercept of the high-thrediold ROC line

1248 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 19, NO. 5, SEPTEMBER/OCTOBER 1989

1 .oo

0.75

0.50

0.25-

0.00 0.00 0.25 0.50 0.75 1.00

FALSE ALARM PROBABlLlM

(4

- -

--

--

I / 1’

e

0

V v

0.00 b I 0.00 0.25 0.50 0.75 1.00

FALSE ALARM PROBABILIT,

(4

- 1.00-- Q/ - - - - - /--& 7 - - - - , /-*

(SCR,RES) P .33, 2 0 Q .33. 4 rn R 1.2, 1 V s 1.2, 2 v

I

0 . o o v 0.00 0.25 0.50 0.75 1.00

FALSE ALARM PROBABILITY

(b)

_.---------- - _ -

(SCR.RES) P 33,2 0 0 33,4 rn R 1 2 . 1 V s 1 2 , z v

000 0 2 5 0 5 0 0 7 5 100 FALSE ALARM PROBABILITY

(4 Fig. 3. ROC curves for observers. (a) Observer 1. (b) Observer 2. (c) Observer 3. (d) Observer 4.

The Pd value obtained by this conversion process is the same value that would be obtained by computing Pd directly from the raw data (using the rational equation for the high-threshold model) $ the observer had been behaving so as to maximize the percentage of correct responses. The conversion Pd therefore makes it possible to compare the results of the present study to those of the previous study of rural clutter. The comparison involves the assumption that observers in the previous study [l] were maximizing the percentage of correct responses, which is not unreasonable in a laboratory setting. A Pd value obtained by this conversion process also has the advantage that it is intu- itively more meaningful and more widely understood than d‘, yet it is free of contamination by response biases.

Green and Swets [3, p. 501 note that the area under the ROC curve (as a percentage of the entire ROC plot) is a distribution- free index of the detectability of the signal. Since Pd is monotoni- cally related to the area under the original ROC, it can be viewed as an alternate measure of sensitivity [3, pp. 50, 2171. Therefore use of the conversion Pd with an invariant response criterion does not imply acceptance of the high-threshold model of detec- tion.

IV. RESULTS Results from the data reduction effort are summarized in Figs.

3 and 4 and in Table IV. Fig. 3 shows the ROC curves for the individual observers with the false alarm probability along the horizontal axis and the hit probability along the vertical axis. There are four curves on each graph: one for each tested stimulus condition. Both the raw data points and the smoothed curve are given for each stimulus condition. Three points are of particular interest in Fig. 3. First, the smoothed curve (based on the equal-variance, Gaussian model of SDT) matches the raw data points quite well. This suggests that this SDT model is indeed appropriate. Second, the probability of a hit for a given false alarm rate increases with image resolution. That is, the curves tend to crowd into the upper left-hand comer (observer sensitiv- ity increases) at higher resolution levels. Third, note that the

TYPE. SCR 0-0 URBAN .33 A-A URBAN 1.2

. ._ 0.0 1.0 2.0 3.0 4.0 5.0

RESOLUTION (Ip/tgt)

Probability of detection data in high-threshold formalism (urban data Fig. 4. only).

ordering of the ROC curves (in terms of sensitivity) is consistent across all four observers.

Table IV contains the results of the conversion process from the signal detection theory formalism to the high threshold for- malism. Sensitivity values, d’, in SDT formalism and the corre- sponding probability of detection, Pd, in the high-threshold for- malism are given. Fig. 4 shows the Pd values graphically. Note the strong influence of resolution level on Pd. Note also that the SCR has very little effect on Pd.

V. DISCUSSION The results show that image resolution had a strong effect on

detection performance while the signal level (relative to back- ground clutter) had little or none. This suggests that spatial relationships (shape and relative location) of features in the image had a greater effect on detection than did the relative luminance of features. Support for this conjecture was found in the debriefing statements of the observers. In particular both relative and absolute spatial cues seemed to play an important role in the decision processes for all of the observers. “Relative spatial cues” refer to the relative size and location of small hot and cold areas in an image. As a result of the practice sessions,

IEEE TRANSACTION5 ON SYSTEMS, AMAN, AND CYBERNETICS, VOL 19, NO 5 , SEPTEMBER/OClOBLR 1989 1249

TABLE IV SENSITIVITY ( d ' ) AND PROBABILITY OF DETECTION ( Pd ) BY CONDITION AND OBS~RVER"

Average Resolution Across

Stimulus Condition Observer

SCR ( Ip /W) 1 2 3 4 Observers

0.33 2 2.26 1.90 1.86 1.34 0.89 0.82 0.81 0.66

0.33 4 3.20 2.96 3.33 2.73 0.98 0.96 0.98 0.97

1.20 1 1.56 1.08 0.97 1.04 0.73 0.56 0.51 0.58

1.20 2 2.64 1.83 1.79 1.51 0.94 0.80 0.79 0.71

1.84 0.80 3.05 0.97 1.16 0.58 1.94 0.81

'Upper figures are d' values and lower figures are Pd values

the observer inferred roughly how hot and cold target areas should be proportioned. This feature was taken to be a cue as to the presence of a target. Comparisons of the relative size of various candidate areas was also employed. Absolute spatial cues were used in reference to the size of the various scene elements. Specifically a size scale for a given image was inferred from an examination of the roads, and it provided a means of judging the size of objects in the image. Of the two types of cues, relative and absolute, relative spatial cues seemed to play the dominant role in the decision process. Shape information provided additional cues as to the presence or absence of a target and was employed in conjunction with the size information.

Observer debriefings also suggested that radiance information played a role in the observer decision process. In particular there was a trade-off in importance between the spatial and radiance information which depended on the image resolution. For exam- ple in the highest resolution images (at 4 Ip/tgt) the spatial cues seemed to dominate the decision process. In these images the observers looked for specific features of the target and, after identifying candidate targets, the radiance information was used to help select the most likely objects as targets. At the lowest resolution images (at 1 Ip / tg t ) no spatial features of the target were resolved, so the observers first looked for a characteristic radiance signature of the target (i.e., a hot spot). After identifying target candidates, the spatial relationship between the hot spot and the background was factored into the decision. For the intermediate resolution cases, at 2 Ip/tgt, neither set of cues dominated; the relative importance of a set of cues depended on the particular image under observation. Target candidates were also eliminated by this combination of spatial and radiance information. From these results it is clear that both radiance and spatial factors play a significant role in the decision processes of human observers.

A . Comparison of Detection Performance in Urban and Rural Clutter

Fig. 5 shows a comparison of the present urban results to previous rural data [l]. It was expected that target detection would be more difficult in urban clutter because of the large number of manmade objects and the increased potential for confusing them with targets. In fact these results tend to support the opposite conclusion. Observations from a comparison reveal the following.

1) For resolutions of both 1 and 4 LP/TGT the detection probability was greater in urban than in rural clutter.

2) At a resolution of 2 LP/TGT, mixed results appear evident. For instance at an SCR of 1.2 rural performance shows the expected improvement over urban performance. However at an SCR of 0.33 urban target detection performance is better than rural.

A ,o I 1 C 3 C

0' 0-0 URBAN 33 A-A URBAN 1 2

0-0 A-A RURAL RURAL .33 1 2 1 0.c0 4 1

0 0 1.0 20 3 0 4 0 5.0 E S O L U T I 9 K ( lp/ tgt)

Fig 5 Detection probabilities for both clutter types.

An explanation of these results focused on the spatial informa- tion in the filtered images: in particular the effects of the filtering process on the urban scene elements was examined. For all of the scene elements filtering had the effects of blurring the sharp edges. Large objects tended to retain both size and shape infor- mation whle the small objects had a tendency to degenerate into indistinct shapes. An observer was aware of these changes to the objects and could reject some of them based on the size cue alone. An observer employed t h s spatial information to both identify potential target candidates and eliminate objects from further consideration. This ability to reduce the set of clutter objects had the effect of dynamically redefining the clutter level of a particular image.

Shape cues based on the edge differences between targets and natural objects (i.e., straight edges versus irregular edges) were important only for sufficiently high resolution images where the blurring was minimal. As the resolution level decreased the filtering effects tended to decrease this distinction to the point that all Objects and targets in the image had approximately the same size and shape. At these resolution levels, the radiance signature differences were the major factor in the decision pro- cess.

To pursue these conjectures in more detail, information from the observer debriefings and reviews of the infrared imagery were used to formulate an interpretation of observations noted in the detection probability comparison. In actuality, several interpreta- tions were possible, but the most likely interpretation was one traced to the clutter definition and its adequacy for urban scenes. This interpretation held that, at a certain resolution, the defini- tion of clutter degenerates into an inverse measure-one that corresponded to lower clutter levels for a higher merit function value. For example, from the definition of clutter given in (l), it was seen that higher contrasts from the urban scene elements (buildings, road, etc.) resulted in a higher clutter level. Yet it was likely that these hgher contrasts actually reduced the real clutter perceived because they were more readily recognized for what they were-nontarget scene elements. With higher contrast, de- fined clutter became higher but the real clutter-the number of

1250 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 19, NO. 5, SEPTEMBER/OCTOBER 1989

confusing objects-became lower. The observer was essentially cueing out the recognizable urban scene elements and was able to view the scene with effectively much less clutter. At such resolu- tions real clutter was actually inversely proportional to defined clutter.

Clutter recognizablility appeared to have a more pronounced effect in urban clutter than it had in rural clutter. This was because the urban scenes had a strong contextual structure: Road grid patterns, buildings in specific places of the grid structure, vehicles in special areas of the grid structure, sharply defined features, etc. Rural clutter on the other hand, was much more likely to be amorphous and unpredictable in content and form. This would have seemingly prevented clutter object recognition and associated observer cognitive screening of the images. Of course these cognitive processes were likely at work in both urban and rural scenes, but their effect would have been more pro- nounced in urban cases.

Such cognitive screening by the observer should be accounted for in the definition of clutter. Otherwise increases in clutter object intensity would merely increase the statistical measure of that clutter. The present clutter definition clearly did not account for the cognitive screening process.

This cognitive screening theory was capable of explaining the trends identified previously. For instance in the plots of Fig. 5 the clutter definition appeared to be breaking down at resolu- tions above 2 Ip/tgt. This was indicated by the much higher urban clutter Pd than rural clutter Pd at 4 lp/tgt, which was opposite to what one would have expected. However at 1 Ip/tgt, where urban results were not much better than rural, resolution would not be adequate for confusing elements to be recognized as such and discarded from the target candidate set. Finally at 2 Ip/ tgt -a clutter definition transition region, the results were ambiguous as one might have expected without the benefit of a proper clutter definition. Here the urban clutter would be either more or less severe than rural clutter. In this region it was necessary to have a clutter definition that applied equally well to both urban and rural features in order to make specific compar- isons and resolve associated ambiguities.

VI. CONCLUSION From the results of this study several postulates relevant to the

detection of targets in cluttered environments can be made. One of the most significant indications from this study is that urban clutter may represent an easier background environment than does rural clutter. This finding leads to the cognitive screening hypothesis. It also has implications for the definition of clutter. For instance the clutter definition given by (1) does not account for the spatial filtering accomplished by the screening process. This is as it should be if one were comparing the performance of human detection capability with simple machine pattern proces- sors for a given background configuration. However it would lead to some difficulty if it were applied to general human detection performance.

Differences in performance between urban and rural types, for instance, are likely to be indicative of potential performance differences between other background types such as mountains and deserts. These differences presumably would be context dependent in the sense that they would result from differing amounts of screening cues. Accordingly a good definition of clutter would still be one that measures the essence of those background features that interfere with the detection process. However it may also have to account for those interfering fea- tures that are readily screened out when they are a part of a larger context. Accordingly a more universally applicable clutter definition is needed to be able to predict detection performance in differing background types. The search for such a definition may be made more difficult by the need to account for contex- tual effects.

While these findings and theories are of considerable interest to understanding the vision process, they have to be viewed with considerable caution if only based on the results reported here and in [l]. This is because of several uncertainties that have resulted from the experimental procedures used to date.

The first uncertainty is connected with the limitations inherent with both the original rural and the present urban study. These studies are limited by the practical necessity to confine the investigation to a vefy few clutter configurations. It is not known for instance how well those configurations represent global rural clutter or global urban clutter. The terms rural clutter or urban clutter embody an infinite number of clutter configurations with the same SCR. It is not known how well the small sample sets used in these studies represent global clutter of either the rural or urban types. In that sense these results may not be repeatable when tested in an identical way against other samples of the same clutter type.

The second uncertainty is related to the decision model used to reduce the raw data. The rural clutter study in [l] reduced data within the framework of the high-threshold theory. The present study however, used the framework of signal detection theory. While results from the two data reduction methods can be related in principle, in practice, additional uncertainty is introduced when comparing the data and in drawing conclusions.

REFERENCES D. E. Schmieder and M. R. Weathersby. “Detection performance in clutter with variable resolution,” IEEE Trans. Aerosp. Electron. S-vsr., vol. 19, pp. 622-630, 1983. J . Johnson, “Analysis of image forming systems,’’ in Proc. Imuge Interi- sifier Symp.. ASTIA Document AD220160, 1958. D. M. Green and J. A. Swets, Signal Detecrion Theory and Psychophysics. New York: John Wiley, 1966. W. P. Tanner, Jr. and J. A. Swets, “A decision making theory of visual detection,” Ps.ycho1. Review, vol. 61, pp. 401-409. 1954. G. A. Gescheider, Psychoph.vsics: Method. Theory, and Application. Hillsdale, NJ: Lawrence Erlbaum Associates, 1985. J. M. Cathcart and A. D. Sheffer, “Target and background infrared signature modeling for complex synthetic scenes,’’ Proc. Soc. Phoro- Optical InstruMenfafion Engineers, vol. 890, pp. 95-103. 1988 J. W. Peifer, R. B. Raker, M. S. West, and J. H. Andrews, “MAX: Geometric Data Base Editor,” Georgia Tech Research Institute/RAIL. Atlanta, GA, 1985. E. Botkin, G. Kelley, and G. Gawronski, “Infrared Modeling and Analyses (IRMA), “Report AFATL-TR-81-65. Grumman Aerospace Corporation, Bethpage, NY 1981. T. J. Doll. “Human detection of visual targets in cluttered backgrounds.” Proc. Elevenrh Biennal Psychol. DoD Svmp., US Air Force Academy, Colorado Springs, CO, 1988.

[lo] J. M. Cathcart, T. J. Doll, and D. E. Schmieder. “Observer detection performance in urban clutter.” Report AFWAL-TR-87-1157, Georgia Tech Research Institute, Atlanta, GA, 1988.

A Sensory Gripper Using Tactile Sensors for Object Recognition, Orientation Control, and

Stable Manipulation MEHRDAD MEHDIAN, MEMBER, IEEE AND H. RAHNEJAT

Absiract -A sensory gripper is presented which incorporates two tactile sensing matrices. The tactile sensors acquire three-dimensional infonna- tion about object contours of interest. An algorithm is presented that is

Manuscript received July 7, 1988; revised February 24, 1989. M. Mehdian is with the Robotics and Machine Intelligence Group, School

of Engineering. Thames Polytechnic, Wellington Street, Woolwich, London, England SE18 6PF.

H. Rajnajat is with On-Line Surveillance, Monitoring, and Diagnotics Unit, School of Mechanical, Aeronautical, and Production Engineering, Kingston Polytechnic. Kingston, Surrey, UK. KT2 6LA.

IEEE Log Number 8928057.

0018-9472/89/0900-l250$01.00 01989 IEEE