\u003ctitle\u003eSystem for indoor 3D mapping and virtual environments\u003c/title\u003e

15
A System for Indoor 3-D Mapping and Virtual Environments S. F. El-Hakim, P. Boulanger, F. Blais, and J.-A. Beraldin Institute for Information Technology, National Research Council, Ottawa, Ontario, Canada K1A 0R6 ABSTRACT The key to navigate in a 3-D environment or designing autonomous vehicles that can successfully maneuver and manipulate objects in their environment is the ability to create, maintain, and use effectively a 3-D digital model that accurately represents its physical counterpart. Virtual exploring of real places and environments, either for leisure, engineering design, training and simulation, or tasks in remote or hazardous environments, is more effective and useful if geometrical relationships and dimensions in the virtual model are accurate. A system which can rapidly, reliably, remotely and accurately perform measurements in the three-dimensional space for the mapping of indoor environments is needed for many applications. In this paper we present a mobile mapping system that is designed to generate a geometrically precise three-dimensional model of an unknown indoor environment. The same general design concept can be used for environments ranging from simple office hallways to long winding underground mine tunnels. Surfaces and features can be accurately mapped from images acquired by a unique configuration of different types of optical imaging sensor and dead reckoning positioning device. This configuration guarantees that all the information required to create the three-dimensional model of the environment is included in the collected data. Sufficient overlaps between two-dimensional intensity images, in combination with information from three-dimensional range images, insure that the complete environment can be accurately reconstructed when all the data is simultaneously processed. The system, the data collection and processing procedure, test results, the modeling and display at our virtual environment facility are described. Keywords: Range sensors, modeling, virtual environments, mobile mapping, calibration, registration, photogrammetry. 1. INTRODUCTION In this section, we give a brief overview of the virtual environment (VE) technology and describe some existing 3-D mapping and navigation systems that may be considered for the creation of VEs. For reasons discussed below, these systems will not be suitable for creating accurate VE, therefore we propose a system combining a range sensor with a configuration of CCD cameras and employing photogrammetric bundle adjustment with effective constraints. 1.1. Overview of virtual environment technology There is no general agreement on the definition of virtual environments, therefore the terminology varies depending on the source. The most accepted definition specifies that a computer-generated, or simulated, world must allow the user to view it from any point or direction and to interact with its objects. Thus, virtual environments are defined as the real-time graphics interaction with three-dimensional models, when combined with a display technology that gives the user immersion in the model world and direct manipulation 1 . The technology will radically change the way people interact with computers and allow them to act as if they were in places they are not, thus enabling them experiences otherwise not possible by simply observing a graphical display. Until recently the entertainment industry has been the leading market, however many other applications, such as training using simulators, industrial design and virtual prototyping, medical, and military applications, are now employing the technology. There are many useful publications that may serve as introduction to the topic and document the state of the technology 2-6 , thus we will only give a brief overview here. Virtual exploration of remote places and environments, either for leisure, engineering design, simulations, or for performing tasks in hazardous environments, is more effective and useful if geometrical relationships and dimensions in the virtual model are accurate 7-9 . Also, since in VE the rendering of images must respond immediately to one's movements, the relationship between the viewer and the 3-D environment must be continuously and accurately known. This is also true for interacting with and manipulating objects in that environment. The degree of accuracy of the virtual model will widely vary with applications. Even within an application the accuracy requirements may vary. For example, the accuracy of the spatial location and

Transcript of \u003ctitle\u003eSystem for indoor 3D mapping and virtual environments\u003c/title\u003e

A System for Indoor 3-D Mapping and Virtual Environments

S. F. El-Hakim, P. Boulanger, F. Blais, and J.-A. Beraldin

Institute for Information Technology, National Research Council,Ottawa, Ontario, Canada K1A 0R6

ABSTRACT

The key to navigate in a 3-D environment or designing autonomous vehicles that can successfully maneuver and manipulateobjects in their environment is the ability to create, maintain, and use effectively a 3-D digital model that accurately representsits physical counterpart. Virtual exploring of real places and environments, either for leisure, engineering design, training andsimulation, or tasks in remote or hazardous environments, is more effective and useful if geometrical relationships anddimensions in the virtual model are accurate. A system which can rapidly, reliably, remotely and accurately performmeasurements in the three-dimensional space for the mapping of indoor environments is needed for many applications. In thispaper we present a mobile mapping system that is designed to generate a geometrically precise three-dimensional model of anunknown indoor environment. The same general design concept can be used for environments ranging from simple officehallways to long winding underground mine tunnels. Surfaces and features can be accurately mapped from images acquiredby a unique configuration of different types of optical imaging sensor and dead reckoning positioning device. Thisconfiguration guarantees that all the information required to create the three-dimensional model of the environment isincluded in the collected data. Sufficient overlaps between two-dimensional intensity images, in combination with informationfrom three-dimensional range images, insure that the complete environment can be accurately reconstructed when all the datais simultaneously processed. The system, the data collection and processing procedure, test results, the modeling and displayat our virtual environment facility are described.

Keywords: Range sensors, modeling, virtual environments, mobile mapping, calibration, registration, photogrammetry.

1. INTRODUCTION

In this section, we give a brief overview of the virtual environment (VE) technology and describe some existing 3-D mappingand navigation systems that may be considered for the creation of VEs. For reasons discussed below, these systems will not besuitable for creating accurate VE, therefore we propose a system combining a range sensor with a configuration of CCDcameras and employing photogrammetric bundle adjustment with effective constraints.

1.1. Overview of virtual environment technology

There is no general agreement on the definition of virtual environments, therefore the terminology varies depending on thesource. The most accepted definition specifies that a computer-generated, or simulated, world must allow the user to view itfrom any point or direction and to interact with its objects. Thus, virtual environments are defined as the real-time graphicsinteraction with three-dimensional models, when combined with a display technology that gives the user immersion in themodel world and direct manipulation1. The technology will radically change the way people interact with computers and allowthem to act as if they were in places they are not, thus enabling them experiences otherwise not possible by simply observing agraphical display. Until recently the entertainment industry has been the leading market, however many other applications,such as training using simulators, industrial design and virtual prototyping, medical, and military applications, are nowemploying the technology. There are many useful publications that may serve as introduction to the topic and document thestate of the technology2-6, thus we will only give a brief overview here.

Virtual exploration of remote places and environments, either for leisure, engineering design, simulations, or for performingtasks in hazardous environments, is more effective and useful if geometrical relationships and dimensions in the virtual modelare accurate7-9. Also, since in VE the rendering of images must respond immediately to one's movements, the relationshipbetween the viewer and the 3-D environment must be continuously and accurately known. This is also true for interacting withand manipulating objects in that environment. The degree of accuracy of the virtual model will widely vary with applications.Even within an application the accuracy requirements may vary. For example, the accuracy of the spatial location and

orientation of doors and openings through which the viewer or a vehicle will go, is higher than wall details. On the other hand,some surfaces may need high-resolution texture maps rather than high geometric accuracy.

Currently, creating VE models remains a challenge and is a limitation to implementing the technology in a wide range ofuseful applications. The "computer-generated" environment should be a truthful representation of the "real" environment ifprecise, well-calibrated, sensors or cameras are used to digitize the latter to create the former. However, for several reasonssuch as availability and cost, most models are built by using either standard geometric primitives, libraries of pre-modeledobjects, or manual digitizing of every point. Building such a model graphically for a detailed environment takes enormousefforts and time and may look unrealistic. This will also be unsuitable for applications requiring accurate representation ofexisting objects and sites. Thus, real-world image-based VE can advantageously complement or replace artificially createdVE in many endeavors. Models suitable for virtual environments differ from those designed for other graphical environments,such as CAD systems. The main constraint is the real-time requirement. The software must, in real time, update the displayaccording to the viewer’s movement. Since each polygon in the model usually consists of at least 3 vertices, each has 3-Dcoordinates, and color or shading value, it is not difficult to imagine the huge number of data required for even a simplescene. The real-time requirement will decide the maximum number of polygons in the model, thus the level of details that canbe handled. The model must also include information on object behavior (response to user action), a script, and thehierarchical relationship between objects. The model must also include a multi-resolution hierarchical representation of all itscomponents. For example, when objects are further than a given distance from the user, it is sufficient to display them at adecreased resolution up to a distance where the objects are not displayed at all.

There are many different sensor technologies that can be used to generate 3-D data, either directly such as range sensors, orindirectly such as CCD cameras. A single type of sensor may not provide sufficient data to completely reconstruct an entireenvironment. The 3-D data captured by the different types of sensor are usually complementary; each can accurately providedata on only certain types of surface or scene subsets10. For example, range data from a laser scanner can be accurate andcomplete over a continuous surface but may produce erroneous results on surface discontinuities such as edges or specularsurfaces. By contrast, intensity images produced by CCD cameras cannot provide complete or accurate 3-D measurements onunmarked continuous surfaces, but can provide high accuracy on features and edges. Since geometric properties required byVE applications include all surfaces and edges or features, both types of data should be integrated to provide these geometricproperties completely and accurately. Requirement vs. cost is also a major factor when selecting the most appropriate sensoror a combination of sensors.

As well as the 3-D model, the VE system includes the head trackers, the image rendering engine, and the 3-D display. Theremainder of this paper primarily focuses on the creation of VE models from sensor data.

1.2. Overview of mobile mapping and navigation technologies

Except for small environments, a mobile platform is required to collect data for mapping and navigation. Mapping systemsrequire precise calibration and must produce geometric models in one global reference system. In those systems the dataprocessing is usually not required to be in real-time. By contrast, the navigation systems do not require a global referencesystem since local coordinate systems and relative geometry is sufficient for navigation, however, speed of processing iscritical since navigation decisions have to be made in real time. In the following sections, a review of both types of systemswill be given and the limitations for accurate indoor mapping will be pointed out.

1.2.1. Mobile mapping

Mobile mapping combines absolute and relative positioning devices and imaging sensors to locate features in a globalreference system11. The absolute positioning sensors provide the framework for all other data collected by the relativepositioning sensors. GPS is used in most mobile mapping systems that claim high absolute accuracy. Other positioningsensors such as INS and/or dead-reckoning devices are also required for positioning and orientation to fill in the gaps whenGPS signal is interrupted or temporarily obstructed12. When the data from these sensors are integrated properly, a continuousand accurate tracking of the absolute position of the sensors becomes possible. In order to determine the position of thefeatures and details to be mapped, imaging sensors such as CCD cameras are used. These cameras will determine the featureposition relative to the camera. Since the camera absolute position in a global coordinate system is known from the absolutepositioning sensors, any mapped feature relative to the camera can be transferred to this global system. The final accuracy is afunction of the accuracy of each of the individual sensors, the calibration and registration of the sensor positions relative to

each other, and the rigorous processing and integration of data. Currently the best achievable accuracy reported from existingsystems, using state-of-the-art GPS/INS/digital CCD cameras, is about 1 meter (RMS) in the global framework and muchbetter for relative position between features.11-15

The main application areas of these systems are road mapping, structural clearance measurements along railways, and aerialtopographic mapping. In those applications, the data collection must be performed at the normal vehicle speed, while most ofthe data processing is carried out off-line. There is normally an operator on board who not only operates the vehicle but alsoperforms interactive functions related to data collection. Since most of the existing prototypes for mobile mapping system donot use laser scanners, many surface details will not be detected.

1.2.2. Navigation

In navigation systems, imaging sensors are used to track points and feature to estimate the vehicle trajectory and motion, andto reconstruct and recognize objects.16-19 Unlike mobile mapping systems, high absolute position accuracy is not required fornavigation. Only relative geometry, or local map, is needed, thus most existing systems do not have to be calibrated properly.On the other hand, the vehicle is fully autonomous and the data must be processed in real time, thus rigorous solutions areusually not possible. It is therefore important to distinguish between systems designed for mapping, where absolute positionaccuracy is the critical issue, and systems designed for navigation or locomotion, where autonomy and speed of processingand decision-making are the major issues of concern.

Sensors used for navigation are passive CCD cameras and / or 3-D range sensors, Doppler radar, sonar, and dead-reckoningdevices. Devices that require long, static observations are typically not useful for navigation. Reported performanceevaluations are in terms of resolution and relative accuracy, and not in absolute or global accuracy. The best relative accuracyreported is about 0.15 m in the local vicinity of the vehicle, which is consistent with the sensor resolution or noise level.

1.2.3. Sample of mobile mapping and navigation prototypes

The literature reports a spectrum of mobile mapping and navigation systems, the majority of which are either in prototypestage or still in a design/simulation stage. It is beyond the scope of this paper to evaluate all existing systems, thus only arepresentative sample will be mentioned. We will restrict those to land systems (i.e. excluding air and underwater systems)which are at least at the prototype stage and were tested in realistic settings.

Table 1: Sample of mobile mapping and navigation systemsOrganization System Name Sensors Application

Ohio State U., USA

U. Armed Forces, Germany

Tech. School Aachen, Germany

Geofit Inc., Canada---------------------------------------CMU, USA

CMU, USA

Several companies anduniversities (ARPA project)

GPSVan11

KiSS14

Surveying Vehicle15

VISAT13

--------------------------Ambler16

Navlab18

UGV19

GPS, INS, wheel counter, CCD& colorvideo cameras

GPS, INS, Odometer, Altimeter, CCD&color video cameras.

GPS, Wheel sensors, Barometer, stereoCCDs

GPS, INS, CCD & color cameras.----------------------------------------------------Laser radar scanner, dead reckoning .

Stereo CCDs, laser radar scanner,Doppler, sonar.

Color, infrared stereo video cameras,laser radar, GPS, tilt meters, flux-compass

Road Mapping

Road Mapping

Road Mapping

Road Mapping----------------------------Locomotion

Navigation

Navigation /recognition(Military)

Table 1 displays the properties of four mapping and three navigation systems. It is obvious that these mapping systems, whichclaim high accuracy, are designed mainly for road mapping and not useful for indoor applications since they rely on GPS for

the global positioning. They also use stereo vision alone to obtain the relative position of features, thus only partial 3-D mapcan be constructed. The navigation systems, although they do not use GPS and incorporate active range sensors, do notconstruct a global map and do not require absolute accuracy. We argue in the next section that the proposed solution solvesthe main problems of the existing mapping and navigation systems for indoor applications.

1.3. Problem statement and the NRC solution

One of the main problems in indoor mapping, particularly for large sites, is error propagation in the absence of GPS or controlpoints. All measurements are made relative to the mobile platform while the absolute position is not accurately knownbecause of the inaccuracy (drift, or error rapidly increases over time) of dead-reckoning devices (wheel odometers) or inertialnavigation systems. Our solution is specifically designed to minimize this error propagation. It has the following features:

1. Solve all data simultaneously, rather than sequentially, using dead-reckoning data only as initial values and using rangesensor data as constraints.

2. Use sufficient number of cameras, the relationships between which, in one position, are known from a calibrationprocedure. The known relationships will be used as additional constraints.

3. Design camera external configuration to provide a strong geometric solution thus reducing error propagation.

In the following sections, the system main components, the 3-D mapping procedure, test results and analysis, and concludingremarks and recommendations will be presented.

2. SYSTEM COMPONENTS AND LAB FACILITIES

The system20 has been designed to implement the features mentioned above. Figure 1 shows the main components of thesystem. The sensor configuration is designed for coverage of side walls and ceiling of an indoor site. Modifications can beeasily made to cover other views, for example a front view. The system consists of a data acquisition platform, a calibrationstructure, a workstation, and a VE display facility. A relatively large site has been constructed to test the system. Eachcomponent will be described in some detail in the following sections.

Biris

Mount frame for 8-CCD camerasand a scanning Biris range sensor

(a)4’

Calibration Frame

(b)Control points for calibration

(c)

16’

9.5’36’

(d) (e) (f)

Figure 1: (a): the sensors and mounts on the data acquisition platform, (b): the calibration structure, (c): detailed view of thecircle in (b), (d): the test site, (e): modeling workstation and (f): the virtual-environment display.

2.1. The Data Acquisition Platform (DAP)

Figure 2 shows the mobile data acquisition platform (DAP) and the mounted cameras. There are eight CCD cameras and oneBiris range sensor mounted on a pan and tilt unit. A PC dedicated to acquire and store the images is also shown. Thefollowing components were used to construct the DAP:

• A mobile platform. This is model Cybermotion K2A+ with a dead reckoning system that provides relative position (X,Y), and a Yaw (rotation angle about its vertical axis) feedback as calculated from wheel encoders. The mobile platformcan be operated remotely via the computer network or manually using a joystick.

• 8 RS-170-standard analogue CCD cameras. Each camera is equipped with 8-mm C-mount lens.• A Biris-head model CHIN/S. It is designed and calibrated for ranges between 0.5 m and 4.0 m. The head is mounted on a

pan and tilt unit. In any robot position, the pan and tilt unit will move the BIRIS head so that two slightly overlappedscans are taken in each side of the robot. These scenes cover approximately the same views as the CCD cameras.

• The computer. A 133 MHz Pentium PC with 32-Mb RAM and 1.7 GB hard drive and 17” monitor.• Image acquisition boards. Two Matrox Pulsar image acquisition boards installed on the PCI bus on the PC. Each board

has inputs for four video cameras.• Software modules. These are designed to control all the data acquisition devices; acquire images from the CCD cameras

and Biris, control pan and tilt unit, acquire data from the dead reckoning system, and store all data on the PC. The onlyinput to this software in the number of positions the robot will move to. Once this number is entered, the programacquires and stores the data for each position without human intervention.

• Sturdy mounts for the cameras, the pan and tilt unit, and the PC.• Ethernet link. This is required for direct access between the acquisition PC and other processing and display computers.

The Biris range sensor and its mounting unit will be described in more detail in the next two sections.

Figure 2: The data acquisition platform 2.1.1. The 3-D range sensor: BIRIS One of the unique components of this system is the range sensor, Biris (named for bi-iris), which is specifically designed toacquire accurate data at a wide variation of ranges (between 0.5 m and 4.0 m for the model used in the tests described in thispaper.) The basic optical principle of the range sensor is shown in Figure 3, with modifications shown in Figure 6 that will beexplained later. The technique uses a modified Biris principle that includes a standard plane of light method to increase bothrange and depth of field of the basic technique21. Because of the two apertures mask and the defocusing of the image, range isobtained by either measuring the average position p=p1+p2 of the two laser spot images p1 and p2 on the CCD photo-detector(plane of light method), or their separation b=p2-p1 (Biris method).

The Biris sensor itself has already been demonstrated for applications of mobilenavigation and environment modeling and mapping21,22. However, for large volume ofmeasurements with large depth of focus (e.g. 0.5 m < z < 4 m) such as those expectedwithin indoor virtual environments, the defocusing of the laser images on the CCD, canbecome prohibitive. Even though the depth of focus is large because of the smallaperture mask of the lens, the 3-D image accuracy and resolution will be slightlyaffected because of the image blur. At extreme ranges, performances of theconventional method may start to degrade. Although this did not create majordifficulties for navigation22, it is a problem for the creation of virtual environments.

In order to reduce this effect one can focus the lens such that depth of focus ismaximized, but then a sign ambiguity in the measurement b=p2-p1 of the Biris methodis present, as illustrated in Figures 4 and 5. As exemplified in Figure 5, it is difficult toresolve and differentiate accurately the laser spot positions p2 and p1 at extreme ranges.

p2 Laser

C C D

Object

p1

mask

Figure 3: Optical Principle of theBiris/Plane of Light method

Laser

b>0b<0

b=0

Figure 4. Sign ambiguity of the basic method for large volume ofmeasurements

b0

b1

Figure 5. Zoom of Figure 4

Figures 6 and 7 illustrate the solution adopted for resolving the sign ambiguity of the measurements.23,24 Splitting the imaginglens in two23 or using an axicon24 automatically creates a small offset/separation in the peak positions. Depth of focus is thendoubled compared to the original method and sign ambiguity is resolved.

Laser

b>0

Figure 6. Extension of the depth of focus by splitting the lens

b 0

b 1

Figure 7. Zoom of Figure 6

2.1.2. Mounting devices: Pan-Tilt Scanning Unit

In order to completely cover the surrounding of the vehicle, the range sensor was mounted on a pan-tilt scanning unit fromDirected Perception Inc. The pan-tilt unit (Figure 8 and 9) can be used to scan the 3-D laser profile around a 360° pan angleand a 85° tilt angle. The scanning parameters as well as the image resolution are computer controlled and therefore fullyprogrammable. Different modes of low resolution and high resolution images can be preprogrammed to almost completelycover the surroundings of the vehicle.

Figure 8. Photograph of the 3-D RangeSensor and its Pan-Tilt Unit

Lasers

RangeCamera

Lens

Pan motor

Laser Profi le

Tilt motor

Figure 9. Description of the 3-D Scanning Unit

2.2. The calibration and registration structure (CRS)

One of the most critical steps in the mapping procedure is the calibration of the sensors. The aim of calibration is to find theparameters defining the relationship between sensor measurements, in its local coordinate system in image space, and thedesired measurements in the global coordinate system in the object space. The calibration parameters can be determined usingsensor measurements on targets of known positions in the global coordinate system. These targets must be firmly and securelyplaced on a solid and stable structure. These targets are also used to register the data from all the different sensors in the sameglobal coordinate system.

Figure 10 shows the calibration and registration structure from various viewing angles. The calibration targets are chosen tobe spherical in shape. This allows them to be viewed from any angle or direction without being geometrically distorted. Thetargets were surveyed with a Leica T101 3”-theodolite to a positioning accuracy of 0.08 mm.

Figure 10: Some views of the calibration structure

The CRS is used to calibrate the 8-CCD cameras and the Biris scans taken at a single position. The calibration parameters forthe cameras and scans are all in the same coordinate system defined by the theodolite measurements of the spherical targets.

For this system design, it is not required to perform the calibration on line. The calibration is best performed at a nearbylocation in an environmentally controlled facility dedicated to this purpose. However, the sensors must not change theirpositions on the DAP or have their internal parameters, such as the lens focus, altered after they have been calibrated. Thisrequires that the sensors be firmly mounted so that the movement and vibration of the platform do not invalidate thecalibration parameters. The mounts used for this experiment kept the sensors in place over the duration of the project. This

was checked by repeated target measurements over several days and comparing the different sets of measurement. Thecalibration procedure was only repeated when the sensors had been deliberately reconfigured.

2.3. The VE facility

A facility dedicated to VE research and application has been established at the Institute for Information Technology at NRC.The goal is to develop a 3-D electronic visualization test bed that will integrate technologies in the field of VE, Real-timeimaging, and 3-D range sensing in order to display and interact with a digital model of an environment in a realistic manner.The main objectives of the current project are:

1- To demonstrate, in a virtual environment, the realism of reconstruction produced from the digital model generated;2- to help in the development of new digital modeling scheme to improve the realism of the digital model produced;3- to experiment and develop various devices to interact with the virtual environment;4- to assess the usefulness of such systems in realistic applications; and,5- to acquire an expertise in the field of VE and 3-D interactive graphic systems.6- to solve visual latency problem with large models generated by various sensors.

The facility, which is a 10 m x 6 m x 3 m room, Figure 11, includesthe following equipment:

- One high speed rear projector, model 9500 from Electrohome;- one 2.25 m by 3 m rear projection screen;- several liquid crystal glasses and a controller for large rooms;- one SGI graphic workstation- Infinite Reality; two PCs, and,- electromagnetic hand and head trackers by Assention Technology.

(a)

Screen

Operator

Hand Tracker

Head Tracker

LCD Stereo Glasses

Target

High speed Graphic Machine

Communication Controler

Head TrakerHand Tracker

Target Computer

V1 V2

V1 V2

Projector

(b)

Figure 11. The VE facility at NRC [called ViEW (Virtual Environment Wall)].

2.4. The test site

The overall dimensions of the demo site are shown in Figure 1-A. Figure 12 shows two views of the walls and surfaces in thedemo site. The site includes:

• A number of “Reference” targets placed on stable surfaces where their positions are known in the global coordinatesystem (Figure 12-A). The target positions were determined by theodolite measurements. These targets will be used forthe accuracy evaluation of the system.

• Various surface types (e.g. Figure 12-B) to evaluate the ability of the range sensor to detect different surfaces.

• Various line-features or edges.

Reference

A B

Figure 12: Some views from the test site

3. THE 3-D MAPPING PROCEDURES

The computational mapping paradigm consists of the following steps:

1- Data collection and processing.2- Construction of the 3-D model.3- The display and manipulation of model in VE.4- Texture mapping.

Several software modules have been written to implement the various procedures. Each of the above steps is described in thenext sections, but first some details on the method are given.

3.1. Data collection and processing

In Figure 13, images within one strip (in one vehicle position) are pre-calibrated, i.e. their parameters relative to each otherare known. Images in different strips are of unknown location relative to each other. These parameters are computed from theso- called bundle adjustment approach25 (simultaneous photogrammetric triangulation of all data from all images). Therefore,in the bundle adjustment, additional constraints in the form of the known relationship of the image in each strip, are utilized tostrengthen the solution.

Each point p, extracted from an image i, has two image coordinate, x and y, and contributes two equations:

),,,,,,,,(

),,,,,,,,(

iiiiiipppyp

iiiiiipppxp

rollyawpitchZYXZYXfy

rollyawpitchZYXZYXfx

=

= (1)

The parameters on the right-hand-side of the aboveequations are the XYZ coordinates of the point p in therequired 3-D coordinate system, and the camera positionand orientation (six parameters) in the same coordinatesystem. Those six camera parameters are the same for allpoints measured in the same image, however each pointadds three new XYZ coordinates.

s t r i p 1

s t r i p 2 v e h i c l e p o s i t i o n

v e h i c l e p o s i t i o n

1

2

Figure 13: The configuration of video images

Since each point adds only two newequations, the solution for all theunknown parameters is not possible. Ifsome of these points appear in anotherimage, they will not add any new XYZcoordinates, but six new cameraparameters are added. In the case wheresufficient number of overlapped imageswith sufficient number of commonpoints, the number of equations areusually larger than the number ofunknowns thus a solution is possible.This is the main principle of thephotogrammetric triangulation approachknown as bundle adjustment25. Thesystem of equations to be solved is anover-determined nonlinear system thatrequires an iterative solution. In the firstiteration, initial values for the unknownsmust be provided. If these values are toofar from the correct values and thegeometric configuration of the camerasis poorly suited for triangulation, thisiterative solution will break down. Inoutdoor mapping systems, the vehicleposition, and thus the camera positions,are determined by GPS and INScombination and thus not solved. Inindoor mapping, however, since the GPS

Image Acquisition

Store Data onVehicle PC

Send Data toProcessing PC

Repeat for each position

Bundle Adjustment

Vehicle Position & Orientation

Register All Datain Global System

Send Data toModeling / CAD

Workstation

CALIBRATION

Figure 14: Simplified block diagram of the computation procedure

can not be used, the system must solve for all the parameters. The proposed solution overcomes the above mentionedproblems associated with the bundle adjustment. Figure 14 describes the procedure. The main steps are:

1. A dead reckoning system, such as wheel encoders, provides initial values for camera position and orientation. 2. In order to strengthen the geometry of the triangulation, data from the range sensor are used. The added constraint is in

the form of geometric relationships between extracted points from the range sensor image and the video image as follows:

0,...),,,,,.......,,,( 111 =BAZYXZYXf pnpnpnppp (2)

Equation (2) describes the relationship between the coordinates of the above mentioned points, where the constants A, B,… are known from the range sensor data (for example a distance between two points or parameters of a plane). Equation(2) will be combined with (1) to strengthen the bundle adjustment solution.

3. The relationships between the images of the 8 CCD cameras are accurately pre-determined by a calibration procedure.This results in another set of constraints to strengthen the solution:

0,....),,,,,,,,,.....,,,,,,( 888888111111 =cbarollyawpitchZYXrollyawpitchZYXf iiiiiiiiiiii (3)

Equation (3) describes the relationship between the positions and orientations of a set of 8 CCD-camera images (a strip ofimages). The constants a, b, c,… are known from the calibration step. Again, equation (3) is combined with equation (1) and(2) to provide a strong bundle adjustment. The shown sensor types and configuration and the simultaneous solution ofequations (1), (2), and (3) to obtain reliable and accurate vehicle positions and absolute coordinates of landmarks are the mainnovel elements of this system.

Z i

Y i

X i

Z b

Y b

X b Z d

Y d

X d

Z o

Y o

X o Figure 15: System calibration and registration

In the registration procedure, a sufficient number of targets of precisely known positions in a global reference system (o inFigure 15,) are imaged by all the sensors (see section 2.2.) The known positions of the targets are used to solve for the CCD-camera and range sensor internal parameters and establish the relationship between the range sensor coordinate system (b inFigure 15) and the CCD-cameras coordinate system (i in Figure 15.) The result of this procedure is that all data have acommon reference system defined by the coordinates of the calibration targets. For the dead-reckoning device (d in Figure15), since it measures vehicle movements from one location to another, its registration parameters are determined by movingthe vehicle from one known position to another.

To summarize, the data collection and processing steps, based on the current version of the system, are as follows:

• In the first position of the DAP, start the image acquisition program and enter the number of vehicle positions required tocover the entire site.

• In each position, the Biris sensor will scan the scene, the 8-CCD cameras will acquire an image each, and the wheelencoders will provide the vehicle position. All the data is stored on the vehicle PC.

• Once the images are acquired at all the positions, the image acquisition program will terminate automatically. Data is nowready to be sent to the remote PC for processing.

• Features are extracted from the CCD images. Currently the software can automatically extract isolated features (such ascircles, ellipses, and rectangles) and corners. However, uneven illumination will affect the reliability of automation.Therefore, this procedure requires the supervision of human operator. It is important that the extracted targets andfeatures are labeled correctly. For example the same target appearing in several images must have the same identificationnumber. In most cases this is done automatically however, for targets or features close to each other, the automaticprocedure may fail. This is where operator intervention is required. The interface is designed to be user-friendly andrequires very little training.

• The target or feature extraction provides 2-dimensional coordinates of labeled points in the image coordinate system. Allthe points, along with the known calibration parameters for each camera, will be the input to the bundle adjustmentprogram.

• The bundle adjustment program computes the position and orientation of each image in one global 3-D coordinatesystem. This in turn is used to compute the position and orientation of the platform where these cameras are mounted.

• The vehicle position and orientation is used to register all the Biris XYZ data and the target and feature XYZ coordinatesin one global coordinate system.

3.2. Building and displaying the 3-D VE model

The registered images generated from the previous steps contain a huge number of points each of which has XYZ coordinatesin the global coordinate system. Even for a relatively small volume, for example a few meters of space, the size of data maybecome unmanageable. Therefore, a more suitable geometric representation of the data is required in order to display andinteract with this data. The most suitable representation is a voxel-based triangular mesh26. The amount of saving in storagespace as a result of converting the data points into surface model will be discussed in section 4. The registered 3-D images,from the Biris sensor, will now be converted from XYZ points into a single non-redundant triangular mesh that can beefficiently rendered on any graphics system. Since there will be an overlap between any successive images, it is required that

the redundant data be first removed. An algorithm for building a non-redundant mesh-model from a large number ofoverlapped 3-D images has been developed at NRC and was applied to all data collected in this experiment26.

(a) (b)

Figure 16: 3-D model of the site. (a) Geometric model. (b) Geometric model plus texture.

The model (Figure 16-a) can be displayed using software that has the basic visualization tools which include; loading,transforming, rendering, and controlling data. Using the SGI workstation at VIT’s ViEW facility, the model can be visualizedat the high resolution and manipulated in real time.

3.3. Texture Mapping

Figure 17: Mapping a texture from an intensity image to the corresponding polygon in the geometric model.

While the generated geometric model is useful for managing the 3-D construction of the site, the user may need to work withother types of data or information. Other sensory information, such as light intensity or texture from CCD video images, canbe precisely mapped into the geometric model provided that the sensor position and orientation is known in the coordinatesystem of the geometric model. In this system, this data is available since the parameters for each video image are computedin the bundle adjustment procedure. Given the 3-D coordinates of the vertices of a polygon, the corresponding projections ofthese vertices in a video image can be located. The light intensity values within the area defined by these projected verticesare stretched and rotated to fit into its counterpart 3-D polygon, as shown in Figure 17. This procedure has been implementedand tested in this experiment and the results, applied to the full model, are shown in Figure 16 (b).

4. TEST RESULTS AND ANALYSIS

The test site described in section 2.4 is used to measure the absolute accuracy of the 3-D models produced by the system andto evaluate its performance, particularly the speed of the various operations and the ability of the sensors to detect differenttypes of feature and surface. For the accuracy assessment, a large number of targets were placed on all the walls and theceiling. The coordinates of these targets in the global coordinate system were measured with high precision theodolites. Thetests were performed under normal laboratory conditions (for example, no special illumination were used) and at differenttimes of the day.

4.1. Geometric accuracy of the 3-D mapping

All the results shown below are based on two data sets. The first is acquired withthe vehicle moving from the west side of the room to the east (direction A in Figure18) and the second is acquired with the vehicle moving in the opposite directionand along a different path (B in Figure 18). Data are collected over a distance of 12meters (Volume 12m L x 5m W x 3m H.) Table 2 displays the difference betweencoordinates obtained from the mapping system and the known coordinates of thereference targets (see Figure 12-a) obtained from theodolite measurements. Usingthe absolute mean and the median on all the points (none rejected for poor quality),

A

B

Z

X

Y

West

Figure 18: Paths of data collectionthe accuracy was about 0.6 mm, or one part in 20,000. When poorly defined features were removed, the accuracy numbersimproved by a factor of two.

Table 2: Difference between computed and known distances.The achieved accuracy, particularly on well-defined points, isclose to what can be achieved using one strip of images (nextsection). This shows that error propagation is very small.

ALL pointsALL pointsWell-defined points

| mean |median| mean |

0.69 mm0.56 mm0.30 mm

1 : 17,0001 : 21,0001 : 40,000

4.2. Calibration accuracy

This is an indication of the accuracy of the XYZ coordinates (see the coordinate system in Figure 10) at one vehicle location,before registration with other locations, thus depends on calibration quality.

Table 3: RMS values for calibration accuracyCCD-Pairs BIRIS

X-mm 0.10 mm 2.27 mmY-mm 0.25 mm 2.44 mm

The estimated accuracy numbers for the sensors arranged in theconfiguration shown in Figure 3, using the targets on the calibrationstructure as reference, are shown in table 3. The results are close tothe theoretical expectations computed from sensor resolution and the Z-mm 0.21 mm 1.87 mmmathematical model (theory of error). RMS values are obtained from differences between known and computed coordinates.

4.3. Bundle adjustment vs. dead reckoning

The purpose of this test is to compare the readings of the vehicle wheel encoders to the more accurate bundle adjustment.

050

100150200250

0 200 400 600 800 1000 1200

System

D. Reck.

Figure 19: Approximate vehicle path using the system results and the dead reckoning reading (all cm values)

It is important to note that this is the best accuracy that can be achieved with the encoders since the vehicle is generallymoving in a straight line and on a flat surface. The results are plotted in Figure 19. The difference was less than 100 mm up toposition 16 where the vehicle made a very small angular movement (4 degrees). The maximum error, which did not follow alinear pattern, was 183 mm over about 10-meter distance. Assuming that the bundle adjustment is the superior solution, thiserror represents an accuracy of 1: 54. This indicates that even for a short distance, the dead reckoning devices alone are notsuitable for accurate 3-D mapping applications.

4.4. Speed of operations

The time required to acquire and process the data on Pentium-150 PC (unless stated otherwise) is as follows:

1. For data acquisition and storage, the times, to nearest second, were:

• At high resolution (256x512 pixels) for 4 Biris scans : 1 minute, 15 seconds• At practical resolution (256x256 pixels) for 4 Biris scans: 50 seconds• For the 8 video images 1 second

The time for Biris scans is affected by the speed of the pan and tilt unit. The Biris scans alone take only 1/60 seconds per line.

2. For target extraction and data processing, the times, including operator’s interaction, were:

• Operator assisted target extraction from 216 video images 8 hours• Bundle adjustment on all data: 216 images and more than 1200 points 5 minutes• Compute XYZ coordinates for every pixel in 108 Biris images 15 minutes• Mesh generation (on SGI) 5 minutes

The time consumed by interactive point extraction (8 hours) is expected to be drastically shorter when the system is fullyautomated.

5. CONCLUDING REMARKS AND RECOMMENDATIONS

The results of the testing and demonstration of the system in the laboratory has shown that it can be very useful for mappingin an indoor environment. We feel that this technology would be most valuable where other technologies can not be usedindoors (such as GPS) or do not have the required accuracy over the entire volume (such as dead reckoning or gyros). Basedon the experience gained from this experiment, the following further development of the system will be required in order toguarantee consistent high accuracy, improve speed, and increase the level of automation.

• It has become apparent that there is a close connection between the accuracy and the quantization of the target or featurein the image. In particular: the contrast between target and background, the number of pixels representing the target, andthe quality or the sharpness of the image. Any one or a combination of these factors will result in a shift in the targetcentroid and thus an error in the XYZ position. In order to overcome these factors, the following will be implemented inthe future: the use of higher resolution digital CCD cameras, providing the mobile platform with its own light source, andthe use of higher quality lenses.

• For the data acquisition platform, the optimum configuration for theCCD cameras is as shown in Figure 20. In this configuration sevencameras are recommended, instead of eight, which reduces the storagerequirement and processing time. The camera pairings are (1,2), (3,4),(3,5), (4,5), and (6,7).

• The pan-tilt unit will be replaced with a rotating table, thus generatingonly one 360-degree image in each vehicle position. This will reducefurther the error propagation, provide full coverage, and also reducestorage and acquisition time.

1

2

3 4 5

6

7

Figure 20: Optimum configuration of CCDcameras.

• During the course of this work, it has been realized that a higher degree of automation will be required in order to reducethe level of expertise needed to operate this system and diminish the total processing time. However, this realization isclosely linked to the application environment and it would not be very effective developing a highly automated systembased only on the experience in a laboratory environment.

ACKNOWLEDGMENTS

Louis-Guy Dicaire and Mike New developed some of the software components. Gerhard Roth provided the meshtriangulation modeling software. David Green and Doug Taylor assisted with the hardware throughout the experiments.

REFERENCES

1. G. Bishop and H. Fuchs, “Research directions in virtual environments.” Computer Graphics, 26(3), pp.153-177, Aug.1992.

2. P. Gonzalez, “Technology Profile: Virtual Environments, SRI Consulting TechMonitoring Report”, 59 pages, Jan. 1996.3. H. Fuchs, “ Virtual Environments: Past, Present, and Future.” Course Notes 14, ACM SIGGRAPH, August 1996.4. J. White, “Designing 3D Graphics: How to create real-time 3d models for games and virtual reality.” John Wiley & Sons,

1996.5. D. Phillips-Mahoney, “Modeling for Virtual Reality. Computer Graphics World,” 18(10), pp. 45-50, October 1995.6. R. Stuart, “The Design of Virtual Environments.” McGraw-Hill, New York, N.Y., 19967. J.C. Goble, K. Hinckley, R. Pausch, J.W. Snell, and N.F. Kassel, “Two-handed spatial interface tools for neurosurgical

planning.” IEEE Computer, 28(7), pp. 20-26, July 1995.8. S.R. Hedberg, “Virtual reality at Boeing: Pushing the envelope.” Virtual Reality Special Report, 3(1), pp. 51-55, Jan.-

Feb., 1996.9. M.F. Polis, S.J. Gifford, and D.M. McKeown Jr., “Automating the construction of large-scale virtual worlds.” IEEE

Computer, 28(7) July, pp. 57-65, July 1995.10. S.F. El-Hakim and J.-A. Beraldin, “On the integration of range and intensity data to improve vision-based three-

dimensional measurements.” In: Videometrics III, Proc. SPIE 2350, pp. 306-321, 1994.11. K. Novak, “Mobile mapping technology for GIS data collection.” Photogrammetric Engineering & Remote Sensing,

61(5), pp. 493-501, May 1995.12. K.P. Schwarz, M.A.Chapman, M.W.Cannon, and P.Gong, “An integrated INS/GPS approach to the georeferencing of

remotely sensed data.” Photogrammetric Engineering & Remote Sensing, 59(11), pp. 1667-1674, Nov. 1993.13. N. El-Sheimy, “A unified approach to multi-sensor integration in Photogrammetry.” Integrated Acquisition and

Interpretation of Photogrammetric Data Workshop, Stuttgart, Nov. 1995.14. H. Heister, W. Caspary, Chr. Hock, H. Klemm, and H. Sternberg, “The mobile mapping system - KiSS -.”Integrated

Acquisition and Interpretation of Photogrammetric Data Workshop, Stuttgart, Nov. 1995.15. T. Aussems, and M. Braess, “Mobile mapping using a surveying vehicle integrating GPS, wheel sensors and digital

video cameras.” Integrated Acquisition and Interpretation of Photogrammetric Data Workshop, Stuttgart, Nov 1995.16. E. Kortkov and R. Hoffman, “Terrain mapping for a walking planetary rover.” IEEE Transactions on Robotics and

Automation. 10(6), pp. 728-739, December 1994.17. P. Weckesser, R. Dillmann, M, Elbs, and S. Hampel, “Multiple sensor-processing for high-precision navigation and

environmental modeling with a mobile robot.” In Intl. Conf. on Intelligent Robots and Systems; Human RobotInteraction and Cooperative Robots, IROS, August 1995.

18. C.E. Thorpe (ed.), “Vision and Navigation: The Carnegie Mellon Navlab.” Kluwer Academic Publishing, 199019. R.J. Beveridge, A. Hanson and D. Panda, “RSTA Research for the Colorado state, University of Massachusetts and

Alliant Techsystems Team.” ARPA Image Understanding Workshop, Monterey, CA, November 1994.20. S.F. El-Hakim and P. Boulanger, “Mobile system for indoor 3-D mapping and creating virtual environments.” US patent

pending, 1996.21. F. Blais, M. Rioux and J. Domey, “Optical Range Image Acquisition for the navigation of a mobile robot”, IEEE Conf.

On Robotics and Automation, Sacramento, California, April 9-11, 1991.22. F. Blais, M. Lecavalier, J. Bisson, “Real-time Processing and Validation of Optical Ranging in a Cluttered

Environment”, ICSPAT, Boston, MA, p.1066-1070, Oct. 7-10 1996.23. M. Rioux and F. Blais, “Compact Three-dimensional camera for robotic applications”, Optical Society of America A.,

Vol. 3. 1518-1521, Sept 1986.24. M. Rioux, US Patent 5,075,561, Dec. 24, 1991.25. D.C. Brown, “The bundle adjustment - Progress and prospective.” International Archives of Photogrammetry, 21(3):

paper no. 3-03-041, 33 pages, ISP Congress, Helsinki, Finland, 1976.26. G. Roth and E. Wibowo, “A fast algorithm for making mesh models from multi-view range data.” In Proceedings of the

DND/CSA Robotics and Knowledge Based Systems Workshop, St. Hubert, Quebec, October 1995.