Pedestrian registration in static images with unconstrained background

Pedestrian Registration in Static Images with Unconstrained Background

Lixin Fan, Kah-Kay Sung and Teck Khim NgSchool of Computing,

National University of SingaporeSingapore [email protected]

Abstract

This paper introduces a human body contour registrationmethod for static pedestrian images with unconstrainedbackground. By using a statistical compound model to im-pose structural and textural constraints on valid pedestrianappearances, the matching process is robust to image clut-ter. Experimental results show that the proposed methodregister pedestrian contours in complex background effec-tively.

1. Introduction

The pedestrian registration problem is to locate and labelpedestrian contour and features (e.g. head, hands and feet)in a given static image. The registration outcomes are repre-sented with different graphical primitives such as points andlines. (see Figure 9 for example results). We are interestedin human registration in static image due to two reasons.Firstly, human body registration can have many potentialapplications including image understanding [10], content-based image indexing and retrieval [35]. The registrationresults can also be used to initialize a human body trackerwhich is a key components in a traffic surveillance system.Secondly, pedestrian images represent a challenging classof highly cluttered image with unconstrained backgrounds.We believe that a successful pedestrian registration algo-rithm can readily generalize for other object classes (e.g.medical images) that share similar characteristics.

A different yet closely related problem to human bodyregistration is human body detection. Given an input im-age, a human body detector determines whether there areany human bodies in the image. If the detection is positive,the detector will return the location and size of each humanbody in the given image. In this work, we assume that thepedestrian detection has already been solved by other meth-ods (see e.g. [25, 29]), and our objective is to register thebody contour and mark the pedestrian features given initiallocation and size of pedestrian images.

Figure 1: Initial pedestrian training images.

1.1 Related Work

The major difficulty of pedestrian registration is in dealingwith severe image clutter in complex outdoor scenes. Let’slook at some example pedestrian images shown in Figure1. One can identify many types of image background, rang-ing from distracting objects (zebra crossing, trees and otherpedestrians etc.) to various lighting conditions and shad-ows. Therefore, a successful pedestrian registration methodmust be able to deal with these image clutter effectively.

Many existing human registration methods designed towork with video sequences attempt to tackle this problemby either assuming uniform background [20], or eliminat-ing the complex image background using motion [2, 17, 15]or colour cues [34]. In addition, some of them also useinterframe correlations to provide temporal constraints onthe possible position of interested features [36, 30]. Thesemethods, however, are not suitable for pedestrian registra-tion in static images with unconstrained background, be-cause motion, temporal and a priori colour informationabout pedestrians are not available for static images.

1.2 Our Approach

In this work, we adopt a statistical modeling approach to im-pose additional structural and textural constraints on validpedestrian appearance. This approach, which is similar toour varying pose face registration techniques described in[12], is able to reliably match pedestrian feature points incluttered environments.

1

https://www.researchgate.net/publication/3192391_First_Sight_A_human_body_outline_labeling_system?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3669570_Pfinder_Real-time_tracking_of_the_human_body?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3764814_Motion-based_recognition_of_pedestrians?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3669595_Cardboard_people_A_parameterized_model_of_articulated_image_motion?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3854150_A_combined_feature-texture_similarity_measure_for_face_alignment_under_varying_pose?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/2459943_Stochastic_Tracking_of_3D_Human_Figures_Using_2D_Image_Motion?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/2761259_Learned_Temporal_Models_of_Image_Motion?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3703226_Pedestrian_detection_using_wavelet_templates?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3764907_Ghost_A_human_body_part_labeling_system_using_silhouettes?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3835195_Object_formation_by_learning_in_visual_databases_using_hierarchical_content_description?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/221304119_Learning_Flexible_Models_from_Image_Sequences?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

To model plausible human body appearance and articu-lations, a statistical object model is used to simulate vari-ous image variations due to changes in object appearances,poses and lighting conditions etc. It has long been notedthat one can learn such an object model from exampleviews of objects by applying Principal Component Analysis(PCA) on training images [32, 22, 24]. A new model im-age can then be reconstructed as a linear combination of theeigenvectors extracted from training images. To deal withlarge structural variations, Beymer, Jones, Vetter and Pog-gio [5, 16, 33], Craw [9], and Cottes and Taylor [19, 7, 8]proposed to model textural and structural variations sep-arately, and improved the quality of reconstructed imagessignificantly. In our work, we adopt this technique to learna compound structural and textural pedestrian image modelfrom training images.

Once the pedestrian image model is learnt, we formulatethe pedestrian registration in static images as a model-basedimage matching problem (see Figure 2). The model param-eters are re-estimated in such a way that the similarity (er-ror) measures between the input image and the model imageis minimized. The pedestrian features are then marked withmodel features defined by the optimized parameters. Werefer to the overall modeling and registration methodologyas view-based object modeling and registration (VOMR).In addition to statistical object modeling, the other two is-sues to be considered are: (1) how to quantify the differ-ences between given images with a reliable similarity (er-ror) measure; (2) how to search for the best model param-eters using an efficient matching algorithm. In our work,following [12], we adopt a combined feature-texture sim-ilarity measure to account for both structural and texturaldifferences between two images. In the process of estimat-ing pose parameters, we adopt a correspondence map basedhill-climbing method, which can avoid local minima moreeffectively and converge quickly.

We note that the view-based object modeling and regis-tration approach is essentially an exemplar based approach.This view-based modeling technique, however, is differ-ent from naive exemplar based approaches in two aspects.Firstly, we preprocess example images to attain two typesof example data: (1) a shape normalized pedestrian imagewhich has structural variation removed and captures tex-tural variation only; and (2) a set of feature point corre-spondence maps (FPCM) which represents the structural(pose) variations between training images and the proto-type pedestrian body. Secondly, instead of storing all thesample data, the statistical modeling technique constructsa compound pedestrian image “model” by applying PCAto various pedestrian images and correspondence maps andcombining textural and structural variation using an imagewarping process. Also note that this view-based approach isgeneric and can be used for other objects by simply chang-

Figure 2: A model-based pedestrian contour registration ap-proach consists three components: (1) a compound pedes-trian model

��capturing permissible image varia-

tion; (2) a combined feature-texture similarity measure ac-counting for image differences between model image andgiven pedestrian images; and (3) an iterative pedestrianregistration algorithm used to find the best model parame-ters corresponding to the minima of the proposed similaritymeasure.

(a) (b) (c) (d)

Figure 3: Manual registration of training images. (a) Proto-type body contour (white dots represent head, hands and feet); (b)Sparse contour point model; (c) Training image manual registra-tion; and (d) Feature point correspondence map (FPCM) between(c) and (b).

Figure 4: Preprocessed pedestrian training images whichhave structural variation removed. Also, the image back-grounds are masked.

2

https://www.researchgate.net/publication/3193009_How_should_we_represent_faces_for_automatic_recognition?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==


https://www.researchgate.net/publication/2919687_Model-Based_Matching_by_Linear_Combinations_of_Prototypes?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3192708_Automatic_Interpretation_and_Coding_of_Face_Images_Using_Flexible_Models?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/2296215_A_Bootstrapping_Algorithm_for_Learning_Linear_Models_of_Object_Classes?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/2524236_Vectorizing_Face_Images_by_Interleaving_Shape_and_Texture_Computations?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/2264976_Face_Recognition_using_View-Based_and_Modular_Eigenspaces?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/221303864_Dimensionality_of_Illumination_Manifolds_in_Appearance_Matching?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/283996949_Eigenfaces_for_Recognition?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/262163115_Active_Appearance_Models?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

ing the training data. Indeed, similar methods have beensuccessfully applied to face registration [7, 12], medical im-age registration [8] and many others (see [1] for websitelinks).

In Section 2, we adopt the statistical modeling tech-niques to construct a pedestrian image model. Section 3introduces the similarity measure and matching algorithmused for pedestrian registration. Experimental results ofpedestrian contour registration are illustrated in Section 4.We discuss the strength and the limitation of the proposedapproach in Section 5 and Section 6 concludes the paper.

2. The Compound Pedestrian ImageModel

We use the images from MIT Pedestrian database describedin [25] to learn a pedestrian appearance model. These train-ing images show people in different poses (frontal, rear,walking, and standing), under different lighting conditions,and with unconstrained backgrounds. Some example train-ing images are shown in Figure 1. Adopting the statisticalmodeling technique, we decouple and model textural andstructural image variation separately, and combine them us-ing a compound pedestrian image model. The detailed pro-cessing procedures are described below.

2.1 The Preprocessing Stage

To learn the pedestrian model, we first need to decouplethe textural and structural variations in the training images.This can be achieved by the following steps.

1. Pedestrian Prototype. We define a prototype bodycontour, which consists of 70 feature points, and 70line segments (Figure 3 (a)). These feature pointsrepresent head, hands, and feet etc.. Figure 3 (b)shows the sparse contour point model, which will laterbe used to quantify the difference between the givenpedestrian image and the warped pedestrian model (seeSection 3.1).

2. Manual Registration. We then manually register theprototype with training images. Notice that when cer-tain human body parts are occluded, we register theclosest contour points instead (see Figure 3 (c) for anexample). This approximation is good enough for ourapplication.

3. Feature based Image Warping. We apply the feature-based image warping technique to warp the trainingimages with respect to the prototype body contour sothat the structural variation of different pedestrian im-ages is removed. We also mask the background regionsoutside the boundaries of the shape-normalized human

(a) (b) (c) (d) (e) (f) (g)

Figure 5: Pedestrian textural variation eigenvectors. (a)Mean pedestrian images; (b)-(g) Eigenvectors 1 to 6.

body. We refer to [3, 8] for detailed warping processdescriptions.

For each pedestrian image, the preprocessing steps generatetwo types of example data: (1) a shape normalized pedes-trian image which has structural variation removed and cap-tures textural variation only (see Figure 4 for several exam-ples); and (2) a set of feature point correspondence maps(FPCM) which represents the structural (pose) variation be-tween training images and the prototype pedestrian body(see Figure 3 (d) for an example). Both shape normalizedpedestrian images and FPCMs are then used to construct thecompound pedestrian model.

2.2 Shape-normalized Textural VariationModeling

Once the training images are shape normalized and masked,we adopt the well-known “eigenface” approach [32, 23] torepresent the texture variation only:��

(1)

where��

is an eigenvector matrix of significant eigenvec-tors obtained by applying PCA on structurally normalizedpedestrian images. The transformation vector

�� describes the textural variation due to different factors. Werefer to the elements of

�as texture parameters. We also

keep the �� largest eigenvalues �! , which will be used toimpose constraints on the texture parameters

� (see Sec-tion 3.1).

The first 6 eigenvectors are illustrated in Figure 5. It isshown that given enough training data the learned modelcan effectively represent image variations due to differentclothing, shadows and lighting conditions.

2.3 Structural Variation Modeling

Given a set of example FPCMs between training pedestrianimages and the prototype pedestrian body, one can learn a

3



https://www.researchgate.net/publication/220721197_Feature_Based_Image_Metamorphosis?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==


https://www.researchgate.net/publication/238379098_Probabilistc_Visual_Learning_For_Object_Reprenta_-_tion?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==


Figure 6: Pedestrian structural variation eigenvectors .

Figure 7: Synthesized pedestrian shapes. Left-right: eigen-vectors 1-6.

statistical structural variation model by applying PCA onexample FPCMs. Following [5, 16, 12], an FPCM ( " ) canbe approximated by:

" #�" �$��%�(2)

in which� %

is an eigenvector matrix of significant eigen-vectors. We will refer to the elements of transformationvector

&�'�� % � � " as pose parameters. As in the case oftexture parameters, we also keep the ��( largest eigenvalues) , which will be used to impose constraints on structuralparameters (see Section 3.1). Figure 6 illustrates the first6 eigenvectors of the learnt model and Figure 7 depicts thesynthesized pedestrian shapes using various shape parame-ters. It is shown that we can effectively simulate differentpedestrian poses such as standing, walking and running etc..

2.4 Combining Textural and Structural Vari-ations

Finally, one can combined textural and structural variationsto model a pedestrian image. Using notations in [5, 16, 12],a pedestrian image can be expressed as:�� *� � ��+��,� � ��-.�� % /�

(3)

where� �

,� %

and�+�0

are defined above, and the sym-bol

-denotes a feature-based image warping process which

essentially shifts pixels in images�

according to a givenFPCM " [3, 8]. In subsequent sections, we will demon-strate how this model can be used for pedestrian registra-tion.

2.5 Model Transformation

During the registration process, the model image in (3) willbe translated or scaled, to best fit the given pedestrian im-ages. This requests the estimation of the translation 1 andscaling 2 . In the proposed matching algorithm, we will firstinitialize 1 and 2 based on the detection results of possiblepedestrian candidates, then iteratively estimate all the pa-rameters

��3� 1 � 2 (see Section 3.2).

3 Pedestrian Contour Registration

Within the model-based image matching framework, thesimilarity measure and the matching algorithm are two im-portant issues to be discussed below.

3.1 The Similarity Measure

The following similarity measure, or equivalently the errormeasure, works effectively for registration of given pedes-trian images and the learnt compound pedestrian imagemodel:4�56587:9<;�9>=�?/@ 7A4+B0CEDF4�B,GHDI4�J,?KD 7A4MLNDO4MPQ?

(4)

in which: 4 J @SRT UVXW 7:9 WZY 9 =W ?\[(5)

]_^ C � `��ab�cV �ed �0f (6)

] ^ G � `� ab3cV g �hi j a+k hi 1�a � g (7)

]_lm� `��b�nV o prqts � g � �X g �vu�� h uIw (8)

]yx�� `� (b3zV o prqts � g ) g �{u�� h uIw (9)

We shall explain each term in details below.]�|stands for the Sum of Squared Difference(SSD) be-

tween a given pedestrian image�

and model image� �

masked by the warped human body shapes. � is the num-ber of image pixels within the boundaries of the warpedbody shapes.

]N^ Cmeasures the edge spatial difference be-

tween the edge map of a given pedestrian image and thewarped pedestrian prototype with the current pose parame-ter

.Ty}~7e@��~?

is the number of feature points in the sparsepedestrian prototype (see Figure 3);

d denotes the distancebetween points in the warped pedestrian prototype and itsnearest neighbor in the edge map of the input pedestrian im-age;

]_^ Gmeasures the edge directional difference, in which

4








hi j a is the unit normal vectors of feature points in the warpedpedestrian prototype, and

hi 1 a denotes the unit tangent vec-tor of the nearest neighboring edge point in the edge map ofa given pedestrian image. Note that

] | �{] ^ C �v] ^ Gtogether

account for the differences between the given pedestrian im-age

�and the model image

� �.

To impose constraints on permissible textural and struc-tural variation, we include the last two terms (

] land

] x)

to penalize significant deviations of model parameters fromthe learnt pedestrian model. If certain components of tex-ture parameter

� exceed the range of � u �! , penalties areimposed. Empirically, we find that

u��,��t�is big enough

to capture sufficient structural and textural variation, whilesmall enough to enable the registration algorithm success-fully cope with severe image clutter. Finally, summing ev-erything together, we have the similarity measure in (4).

3.2 The Matching Algorithm

For the image matching algorithm, our ultimate objectiveis to find the best model parameters (

�+�0�� 1 � 2 ) which cor-respond to the minimum of the proposed similarity (error)measure: �e�+�0�� 1 � 2 �v�y� q8�~�� ] �� e�X�{� � ��

(10)

Given the complex form of the proposed similarity mea-sure in (4), it is difficult to obtain an analytical solutionof (10). Fortunately, we can seek a numerical solution byadopting the following iterative registration algorithm insuch a way that different model parameters are estimatedin separate steps. In the proposed registration algorithm,step 2 first aligns the structures of two given images. Thenstep 3 synthetizes textural variation due to changes in objectappearance and lighting condition etc.. The re-estimation oftextural variation, in turn, can lead to more reliable extrac-tion of feature points when we match object structures insubsequent iterations. This iterative estimation is similarin spirit to the Expectation-Maximization (EM) algorithm[27].

Initialize � and � based on pedestrian detection results;

Initialize � and � to 0.0;

1. Fix others, optimize � ; � ;2. Fix others, optimize � ;

3. Fix others, optimize � ;

4. iterate 2 and 3, until convergence;

Note that in steps 1 and 3, we use the general optimiza-tion algorithm (i.e. Levenberg-Marquardt method [26]) toestimate model parameters 1 � 2 �{� . In step 2, however, we

make use of a more domain-specific method, called FPCM-based hill-climbing to estimate

. Given pedestrian im-

ages and the current model images, we establish the fea-ture point correspondence map using a simple closest edgepoint matching method. We then estimate the search direc-tion � by projecting the FPCM onto the eigenshape space,and iteratively re-estimate

until convergence:

� r�� % � � " (11)

In the early iterations, the closest edge point matching maybe poor approximation to true correspondence. Applying(11) will bring the pedestrian edge points closer to themodel contour points. As the iterations continue, the match-ing of closest point will eventually approach the valid cor-respondence. The advantage of using the FPCM based hill-climbing is twofold. Firstly, the FPCM hill-climbing ismore effective in avoiding local minima, since the parame-ter updating � is not determined by gradient descent. Sec-ondly, this domain-specific matching algorithm is determin-istic and does not involve any stochastic process. Thus, it ismore efficient compared with the general-purposed stochas-tic methods, such as Simulated annealing [26]. Empirically,the proposed registration algorithm converges in only a fewiterations.

4 Experimental Results

In our experiment, we use the MIT Pedestrian imagedatabase, which consists of more than 900 pedestrian im-ages with various body shapes and unconstrained back-grounds. The image size is 64 x 128 pixels. From 407training images in the database, we first construct a pedes-trian model with 20 eigenvectors capturing textural varia-tions and 15 eigenvectors to represent different poses.

4.1 Measurement Criterion

To measure the performance of the registration algorithm,we define a goodness criterion for each individual registra-tion result as follows:� Good Registration: We declare a good registration

as having both contour and feature points (e.g. head,hands and feet) correctly registered.� Fair Registration: The result is considered as a fairregistration if it has its contour correctly registered butwith 1 up to 5 out of 70 feature points misaligned (i.e.feature points are more than 5 pixels away from theircorrect position). Note that both good registration andfair registration are deemed as successful registration.� Mis-Registration: A mis-registration is declaredwhen the registration algorithm fails to converge; or

5

https://www.researchgate.net/publication/23608772_Mixture_Densities_Maximum_Likelihood_and_the_EM_Algorithm?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

#Pedestrian Good Fair Misalignment

500 307 (61.4%) 131 (26.2%) 62 (12.4%)

Table 1: Pedestrian registration results, see goodness criteriondefined in Section 4.1.

the registration output has the body contour and/ormore than 5 feature points misaligned.

We run our proposed registration algorithm on two testdatabases. The performance of the proposed method ischaracterized by the ratio of the number of good, fair andmis-registrations to the total number of human bodies ineach database.

4.2 Experiment 1

In this experiment, we tested the proposed pedestrian con-tour registration method with 500 test images in the MITPedestrian image database. The purpose of this experimentis to evaluate the performance of the proposed pedestrianregistration algorithm working with ideal pedestrian detec-tors, which output accurate initial translation and size pa-rameters and have no false detections.

Some registration results are illustrated in Figure 9. Wenote that the registration is generally robust to change inbody texture (e.g. varying clothing and shadow etc.) andpose (e.g. frontal and rear, standing and walking etc.). Fur-thermore, it is shown that the proposed registration algo-rithm is effective even when severe clutter (e.g. zebra cross-ing, trees, and other pedestrians etc.) is present in the un-constrained background. Table 1 summarize the successrate of the pedestrian registration algorithm. Among 500test images, there are more than 307 (61.4%) pedestrianswell registered; 131 (26.2%) images fairly registered withfew feature point misalignments; and 62 (12.4%) imagesmis-registered.

4.3 Experiment 2

In the second experiment, we test the proposed algorithmusing 30 images with more complex scenes (see Figure 11for some examples). These images contain both pedestriansand people in different activities (e.g. wedding, playing cro-quet etc.). The purpose of this experiment is to evaluate theperformance of the proposed registration algorithm work-ing with a real pedestrian detector which may produce falsedetections and inaccurate estimates of size and position ofpeople. We first use the pedestrian detection method pro-posed in [29] to locate 42 people of difference appearanceand pose in these images1.

1The pedestrian detection method missed 5 people in the test images.When evaluating the registration algorithm, we did not include these 5misses.

The experimental results show that the proposed humanbody registration algorithm rejects 1 ( �8� ) false detection,successfully registers 34 ( � ` � ) and mis-registers 7 ( `�� )out of 42 detected people. Figure 11 illustrates some reg-istration results for 9 people. A detailed inspection showsthat one pedestrian in image A is mis-registered due to largepose variation, and the feet of the wedding couple in imageC are partially mis-registered due to the lack of reliable fea-tures.

5 Discussion

An exemplar based object representation often consists ofexample 2D views from a variety of pose, lighting condi-tion and shape deformation. For instance, in the view-basedsystem of Breuel [6], two airplane toy models are repre-sented by 32 views sampled from the upper half of the view-ing sphere. In a pose independent face recognition prob-lem, Beymer [4] showed that for identification purposes,one can reliably represent human faces with 15 views ofvarying pose faces. One key challenge in exemplar basedapproach is to capture and represent the possible variationin pose, lighting and shape deformation using as few exam-ple images as possible. The view-based eigenspace method[31, 18, 32, 23] is such kind of efficient approach usingprincipal component analysis (PCA) [11, 14] to constructa compact representation from a large set of object images.

While the view-based eigenspace approach is proven tobe efficient in modeling fixed pose objects, it has diffi-culty in representing object images which are grossly mis-aligned. Nayar et. al. [24] have demonstrated that the dis-tribution of varying pose objects often forms a non-convexyet connected region in the high-dimensional image space.This complex distribution violates the underlying Gaussiandistribution assumption of the eigen-subspace approach,and therefore cannot be adequately captured within a low-dimensional subspace. To represent the distribution of vary-ing pose object images, Pentland et. al. [22] and Schneider-man [28] proposed to use multiple pose-dependent eigen-subspace models. However, this approach often involvescollecting example images for each individual subspacemodel, and it is difficult to cover arbitrary variation in pose.

Following [5, 7, 8, 9, 16, 19, 33], we decouple and model“textrual” and ”structural” variations separately, and com-bine both types of variation using an image warping pro-cess. The image warping process actually introduces a non-linear transformation on the distribution of fixed pose ob-ject images. While the two submodels simply representtwo elliptical Gaussian distributions of textural and struc-tural variation, the combination of these two components re-sults in a much more complex manifold in image space. Asan example, Figure 8 illustrates the distribution of a set offrontal face images and its warped counterparts synthesized

6


https://www.researchgate.net/publication/19588504_Low-Dimensional_Procedure_for_the_Characterization_of_Human_Faces?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/3191724_Application_of_the_KL_Procedure_for_the_Characterization_of_Human_Faces?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/37597385_An_Efficient_Correspondence_Based_Algorithm_for_2D_and_3D_Model_Based_Recognition?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==










https://www.researchgate.net/publication/3560553_Face_recognition_under_varying_pose?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/248848344_Pattern_Classification_and_Scene_Analysis?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==

https://www.researchgate.net/publication/225304623_Inroduction_to_Statistical_Pattern_Recognition?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==


Figure 8: Nonlinear distribution of varying pose face im-ages. Each data represents a synthesized face image pro-jected in the subspace spanned by the 3 most significanteigenvectors. (pose 1: right rotated; pose 2: slightly rightrotated; post 3: frontal; post 4: slightly left rotated; andpose 5: left rotated.)

by the proposed compound model [13]. One can see thatthe overall distribution forms a non-convex and connectedregion. We believe that the decoupling and combination oftextural and structural variation allows us to reliably andefficiently represent grossly misaligned images with a para-metric model. Finally, this view-based object modeling andregistration approach requires that object shape and distinc-tive feature points be unambiguously identified in exampleobject images. Therefore, this method cannot be applied toobjects (e.g. cells) which have no well-defined shape anddistinctive features.

6 Conclusions and Future Work

Working within a generalized view-based object modelingand registration (VOMR) framework, we tackle the problemof pedestrian contour registration in static images with com-plex background. By using a statistical compound model toimpose structural and textural constraints on valid pedes-trian appearances, the matching process is robust to im-age clutter. Experimental results show that the proposedmethod successfully registers complex pedestrian contourseven when image backgrounds are heavily cluttered.

We note that mis-registrations are mainly due to two rea-sons. Firstly, the images are too blurry for meaningful fea-tures to be extracted reliably (see Figure 10 (a)). To extractmore robust features, one could perform perceptual group-ing on noisy features as suggested in [21]. Secondly, thereare too much variations in pedestrian shapes, (see Figure 10(b)). To handle a wider range of pose variations, one can

include more training examples and possibly use a mixtureof Gaussian model. This is one of our current research di-rections.

Dedication

In memory of Kah-Kay Sung.

References[1] http://www.wiau.man.ac.uk/ � bim/asm links.html.

[2] A. M. Baumberg and D. C. Hogg. Learning Flexible Modelsfrom Image Sequences. Technical report, Division of Artifi-cial Intelligence, School of Computer Studies, University ofLeeds, October 1993.

[3] T. Beier and S. Neely. Feature-based image metamorpho-sis. In SIGGRAPH’92 Proceedings, pages 35 – 42, 3992.Chicago, IL.

[4] D. Beymer. Face Recognition under Varying Pose. AI Lab,Memo 1461, MIT, 1993.

[5] D. Beymer. Vectorizing Face Images by Interleaving Shapeand Texture Computations. AI Lab, Memo 1537, MIT, Sept.1995.

[6] T. M. Breuel. An Efficient Correspondence based Algorithmfor 2D and 3D Model based Recognition. AI Lab, Memo1259, MIT, 1993.

[7] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active Ap-pearance Models. In H. Burkhardt and B. Neumann, editors,Proceedings of the European Conference on Computer Vi-sion, volume 2, pages 484–498, 1998.

[8] T. F. Cootes and C. J. Taylor. Statistical Models ofAppearance for Computer Vision. Technical report,Wolfson Image Analysis Unit, University of Manchester,(http://www.wiau.man.ac.uk), Dec. 2000.

[9] I. Craw, N. Costen, T. Kato, and S. Akamatsu. HowShould We Represent Faces for Automatic Recognition?IEEE Trans. Pattern Analysis and Machine Intelligence,21(8):725–736, August 1999.

[10] L. Davis, D. Harwood, and I. Haritaoglu. Ghost: A HumanBody Part Labeling System Using Silhouettes. In ICPR98,page SA11, 1998.

[11] R. O. Duda and P. E. Hart. Pattern Classification and SceneAnalysis. John Wiley & Sons, second edition, 1973.

[12] L. X. Fan and K. K. Sung. A Combined Feature-TextureSimilarity Measure for Face Alginment Under Varying Pose.In Proc. of IEEE Conference on Computer Vision and PatterRecognition, pages 308 – 313, 2000.

[13] L. X. Fan and K. K. Sung. Model-Based Varying Pose FaceDetection and Facial Feature Registration in Video Images.In Proc. of ACM Multimedia, pages 295 – 302, Los Angeles,USA, 2000.

7





https://www.researchgate.net/publication/243769686_A_Computational_Framework_for_Segmentation_and_Grouping?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==















https://www.researchgate.net/publication/221571805_Model-based_varying_pose_face_detection_and_facial_feature_registration_in_video_images?el=1_x_8&enrichId=rgreq-f903957a9c0f392af550e7417e773839-XXX&enrichSource=Y292ZXJQYWdlOzIyMDYwMDc0NTtBUzoxMDI0MTc3OTg1MzMxMjFAMTQwMTQyOTcxNDA3Ng==















[14] F. Fukunaga. Introduction to Statistical Pattern Recognition.Academic Press, 1990.

[15] B. Heisele and C. Wohler. Motion-based Recognition ofPedestrian. In Proc. of IEEE International Conference onPatter Recognition, pages 1325 – 1330, 1998.

[16] M. J. Jones and T. Poggio. Model-Based Matching by Lin-ear Combinations of Prototypes. AI Lab, Memo 1583, MITArtificial Intelligence Laboratory, Nov. 1996.

[17] S. O. Ju, M. J. Black, and Y. Yacoob. Cardboard People:A Parameterized Model of Articulated Motion. In Proc. ofInternational Conference on Automatic Face and GestureRecognition, pages 38–44, Los Alamitos, California, Oct.1996.

[18] M. Kirby and L. Sirovich. Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces.IEEE Trans. Pattern Analysis and Machine Intelligence,12:103–108, 1990.

[19] A. Lanitis, C. J. Taylor, and T. F. Cootes. Automatic Inter-pretation and Coding of Face images using Flexible Mod-els. IEEE Trans. Pattern Analysis and Machine Intelligence,19(7):743–756, July 1997.

[20] M. K. Leung and Y.-H. Yang. First Sight: A Human BodyOutline Labeling System. IEEE Trans. Patterf Analysis andMachine Intelligence, 17(4):369–397, April 1995.

[21] G. Medioni, M.-S. Lee, and C.-K. Tang. A ComputationalFramework for Segmentation and Grouping. Elsevier Sci-ence, New York, 2000.

[22] B. Moghaddam and A. Pentland. Face Recognition usingView-based and Modular Eigenspaces. In Proc. of SPIE,pages 12–21, 1994.

[23] B. Moghaddam and A. Pentland. Probabilistic Visual Learn-ing For Object Representation. PAMI, 19(7):696–710, July1997.

[24] S. Nayar and H. Murase. Dimensionality of IlluminationManifolds in Appearance Matching. In In Int. Workshop onObject Representations for Computer Vision, 1996.

[25] M. Oren, C. Papageorgiou, and T. Poggio. Pedestrian Detec-tion Using Wavelet Templates. In Proc. of IEEE Conferenceon Computer Vision and Patter Recognition, pages 193–199,1997.

[26] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P.Flannery. Numerical Recipes in C. Cambridge UniversityPress, second edition, 1992.

[27] R. A. Radner and H. Walker. Mixture Densities, MaximumLikelihood and the E.M. Algorithm. SIAM Review, 26:195–239, 1994.

[28] H. Schineiderman. A Statistical Approach to 3D Object De-tection Applied to Faces and Cars. PhD thesis, CarnegieMellon University, May 10 2000.

[29] H. Setyawan. Model-based Human Detection in Images.Master’s thesis, National University of Singapore, 2001.

[30] H. Sidenbladh, M. J. Black, and D. J. Fleet. Stochastic Track-ing of 3D Human Figures Using 2D Image Motion. In Pro-ceedings of the European Confrence on Computer Vision,2000.

[31] L. Sirovich and M. Kirby. Low-dimensional Procedure forthe Characterization of Human Faces. Journal of the OpticalSociety of America, 4(3):519–524, March 1987.

[32] M. A. Turk and A. P. Pentland. Eigenfaces for Recognition.Journal of Cognitive Neuroscience, 3(1):71 – 86, 1991.

[33] T. Vetter, M. J. Jones, and T. Poggio. A Bootstrapping Al-gorithm for Learning Linear Models of Object Classes. AILab, Memo 1600, Feb. 1997.

[34] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pent-land. Pfinder: Real-Time Tracking of the Human Body.IEEE Trans. Pattern Analysis and Machine Intelligence,19(7):780–785, July 1997.

[35] Y. Xu, E. Saber, and A. Tekalp. Object Formation by Learn-ing in Visual Database using Hierarchical Content Descrip-tion. In ICIP99, page 26PO2, 1999.

[36] Y. Yacoob and L. Davis. Learned Temporal Models of Im-age Motion. In Proc. of IEEE International Conference onComputer Vision, pages 446 – 453, 1998.

8


































































Figure 9: Pedestrian registration results. Row 1 and 2: good alignments. Row 3: fair alignments. Note that severe clutter (e.g. zebracrossing, trees, and other pedestrians etc.) is present in the background.

9

(a) (b)

Figure 10: Pedestrian mis-registration examples.Image (a) is misregistered because it is too blurry for reliable features to beextracted. The people in image (b) is misregistered because his left hand is placed on his head, exhibiting too much posevariation for the learnt pedestrian model.

Figure 11: Human body registration results with complex scene.

10

Pedestrian registration in static images with unconstrained background

Documents

Transcript of Pedestrian registration in static images with unconstrained background