3D Object Recognition and Retrieval Using Navigating Robots ...

239
3D Object Recognition and Retrieval Using Navigating Robots with Active Vision "MROLR" A Mobile Robotic Object Locator and Retriever A thesis submitted for the degree of Doctor of Philosophy Michael Wingate August 2003 School of Electrical Engineering Faculty of Science, Engineering and Technology Victoria University P.O. Box 14428, MCMC, Melboume Victoria 8001, Australia

Transcript of 3D Object Recognition and Retrieval Using Navigating Robots ...

3D Object Recognition and Retrieval Using Navigating Robots with Active Vision

"MROLR" A Mobile Robotic Object Locator and Retriever

A thesis submitted for the degree of

Doctor of Philosophy

Michael Wingate

August 2003

School of Electrical Engineering

Faculty of Science, Engineering and Technology

Victoria University

P.O. Box 14428, MCMC, Melboume

Victoria 8001, Australia

FTS THESIS 629.8932 WIN text 30001008249288 Wingate, Michael 3D object recognition and retrieval using navigating

Declaration

This thesis contains the work of my doctoral studies under the supervision of Assoc. Prof. Mike

Sek (Victoria University of Technology), Assoc. Prof. Patrick Leung (Victoria University of

Technology), and Dr Hao Shi (Swinburne University of Technology). The contents of this thesis

and related work have been referenced in the following papers that have been published at

conferences, journals or are under review for publication.

Publications

L M. Wingate and A. J. Davison, "Object Retrieval using an Active Stereo Head and a Robot Arm," 4th Int. Confi. on Modelling and Simulation, pp. 417-422, Australia, Nov 2002.

2. M. Wingate, "3D Modeling for the Recognition of Remote Objects," 4th Int. Confi on Modelling and Simulation, pp. 405-410, Australia, Nov 2002.

3. M. Zulli and M, Wingate, "A Micro Mobile Surveillance System with Wireless Video", 4ih Int. Confi on Modelling and Simulation, pp. 452-457, Australia, Nov 2002.

4. N. Burley and M, Wingate, "An Active Camera Target Surveillance Tracker," 4th Int. Conf. on Modelling and Simulation, pp. 440-445, Australia, Nov 2002.

5. H. Moyssidis and M. Wingate, "A Speech Guided Robot Controller with 3D Graphical Positioning," 4th Int. Conf. on Modelling and Simulation, pp. 428-433, Australia, Nov 2002.

6. M. Wingate, " "MROLR" A Mobile Robotic Object Locator and Retriever," Post Graduate Seminar Presentation, School of Built Environment, Victoria University, Sept 2002.

7. M. Wingate and A. J. Davison, "3D Recognition of Remote Objects Utlizing Stereo Vision on a Roving Platform," ACCV2002, Proc. 5th Asian Conf. on Computer Vision, pp. 682 -688, Australia, January 2002.

8. M. Wingate, "Structure Recovery of Objects Using Multiple Camera Views", Post Graduate Seminar Presentation, School of Electrical Engineering, Victoria University, 1999.

in

P. M. Wingate, "3D Structure Recovery of Rigid Objects Using Pan-Tilt-Vergence Stereo" Proc. Joint DICTA '97 & IVCNCZ'97 Conf Auckland, N.Z. pp. 83-88, Dec 1997.

10. C. Jones, and M. Wingate, "Robotic Harvesting in a Simulated Environment" 3rd Int. Conf. on Modelling and Simulation, Melboume, pp. 363 - 367, Oct. 1997.

11. M. Wingate, "3D Structure Recovery of Objects Using Line Features in Multiple Camera Views" 3rd/n^ Conf on Modelling and Simulation, Melboume, pp. 386 - 391, Oct. 1997.

12. M. Wingate, and A. Stoica, "An Application of Fuzzy Neurons to Visual Leaming", 3rd Int. Conf. on Modelling and Simulation, Melboume, Oct., pp. 45- 50, 1997.

13. M. Wingate, and A, Stoica, "A Step Towards Robot Apprentices - Visual Leaming", 4th lASTED Conf Robobtics & Manufacturing, Hawaii, pp.120 - 123 Aug. 1996.

14. P.T. Im, W.S. Lee, and M. Wingate, "A progressive fiizzy clustering approach for the detection of linear boundary and comers", Australian Journal of Intelligent Information Processing Systems, Vol.3, No3, pp. 42-58, Australia, 1996.

15. P.T. Im, M. Wingate, and B. Qiu, "A cluster prototype centring by membership algorithm for fuzzy clustering", 2nd Asian Conference on Computer Vision, Singapore, vol. 2, pp. 67-70, 1995.

16. A.Y.S Chong and M.Wingate, "A Colour Feature Based Stereo Vision System for Robot Assembly", ICA/ACME Int. Conf. on Assembly combined with the Australian Conf. on Manufacturing Engineering, Nov. 1993.

IV

statement of Originality

This thesis contains no material that has been accepted for the award of any other degree or

diploma in any university and that, to the best of the author's knowledge and belief, the thesis

contains no material previously published or vratten by other persons, except where reference is

made in the text of this thesis.

Michael WINGATE

School of Electrical Engineering,

Faculty of Science, Engineering and Technology,

Victoria University of Technology

P.O. Box 14428, MCMC

Melboume, Victoria 8001

AUSTRALIA

Acknowledgements

I am indebted to each of my supervisors for their guidance, suggestions, feedback and wise counselling. Since commencing this research the following members of staff have taken on the burden of my supervision. Dr. Mike Sek, Dr. Patrick Leung, Dr. Hao Shi, and Dr. Len Herron. The latter two unfortunately having left Victoria University prior to my thesis submission.

In 1999 I had the good fortune of spending three months at the University of Sheffield (U.K). There my work gained major impetus under the direction of Dr. Peter Rockett, and Dr. John Porril, and PG Students Simon Crossley, Bala Amavasai and Fabio Caparrelli (each having gained a Ph.D.). In particular I wish to thank them for their valuable discussions and for providing me with access to, and help with, the University's hardware and software resources. It is at this University that "TINA" was bom and is now being continued at the University of Manchester by Dr. Neil Thacker and others (formally of the University of Sheffield). I am also indebted to Dr. Neil Thacker for his valuable assistance in debugging library modules of TINA. The TINA environment is a significant software component of the work in this thesis.

Dr. Andrew Davison (of the Robotics and Research Group, University of Oxford, UK) is another eminent scholar I am deeply indebted to. He has generously made his software (SCENE) available to all, software that is also a significant component of the work in this thesis. Andrew guided the adoption and modification to "SCENE" for my navigation needs. He has also co-authored several papers relating to this research,

I adopted code from Gurvinder Pal Singh's "Virtual Robotics Lab" to implement a "3D simulation environment for MROLR. Gurvinder has generously made his software available.

I owe thanks to academic staff within my discipline area, namely Dr. Wee Sit Lee and Dr. Fu Chun Zheng, whom I frequently tumed to for assistance in obtaining analytical solutions to difficult problems.

I owe special gratitude to a large number of the Technical Officers within the School of Electrical Engineering. In particular to Foster Hayward for providing excellent technical advice and contribution in establishing the local area network required for MROLR. Both Foster and Taky Chan also played a key role in the constmction of vital hardware and in keeping it functioning. Hahn Vuong is owed gratitude for maintaining the various Linux, Windows and DOS platform software. Ralph Phillips (now retired) also gave significant valuable technical assistance and advice.

Several PG students contributed either directly or indirectly to this work in the form of the knowledge gained during my role as supervisor and joint author of papers published. They were Adrian Stoica and Paul Im (both having now completed their Ph.D. before their supervisor), and Chris Jones.

VI

Dr. Greg Martin bravely volunteered to proof read this thesis, I am tmly gratefiil to him for this and his useful suggestions.

Finally I wish to thank the Senior Administrators of the Faculty of Science, Engineering and Technology, the School of Electrical Engineering, and former Department of Electrical and Electronic Engineering of this University for allowing me the time and facilities to pursue this research.

Vll

Abstract

This work describes the development of a new type of "service robot" called "MROLR" capable of searching for, locating (pose determining), recognizing, and retrieving 3D objects in an indoor environment. Key related contributions include the development of 3D object-model creation and editing facilities, and the application of a vision enhanced navigation algorithm.

MROLR comprises a mobile robot transporting a four degrees of freedom (DOF) stereo head, equipped with two calibrated charge-coupled device (CCD) video cameras, and a docking robot arm. The principal navigation mode utilises Simultaneous Localisation and Map-building (SLAM) to update and maintain a map of the robot's location and multiple navigational-feature positions as it steers towards specified coordinates. The algorithm is Kalman Filter based. Navigation-features comprise either of known (previously stored) or newly acquired, visible landmarks. During navigation, automatic navigation-feature selection and measurement is performed and the results of measurements used to substantiate or correct odometry readings.

Objects to be retrieved from a scene have an associated (previously created and stored) object-model Object-models are created by the integration of a sequence of 3D models constructed from stereo image pairs, each representative of the object in varying angular positions. The model of an object sought is retrieved from a database and matched against scene-models until recognition and pose is established. Scene-models are formed during the search while panning the scene. Both object-models and scene-models comprise 3D straight line and conic segments. Recognition is based on verifying the existence of mutual groups of 3D line and or conic edge features in both the scene and model object. Where the object has sufficient distinctive features, recognition is view independent and tolerant to both scale variations and occlusions. On finding the object, a six DOF robot arm, attached to a caster platform, is manually docked with the mobile robot. Using the object's pose transform the arm is able to grasp and place the object on the mobile base. The arm is manually de-coupled from the mobile robot and the object transported back to the home position.

While it would be possible to mount the head and arm on one mobile base, the intention here is to ultimately have a light weight fast moving "scout robot" to seek and find using vision, and a slower heavy-duty transport robot to lift and carry.

Finally a "3D Virtual Environment" for simulating MROLR, that could be useful for evaluating alternative map-building strategies, object grasping points, as well as for demonstration and educational purposes, has been implemented, although is not yet complete.

Key Words: 3D modeling, recognition, stereo vision, mobile robot navigation.

vm

Contents

3D Object Recognition and Retrieval Using Navigating Robots with Active Vision i

Statement of Originality iv

Acknowledgements v

Abstract vii

Contents viii

Table of Figures xii

List of Notation xv

Chapter 1 About This Thesis 1

1.0 Introduction 1 1.1 Thesis Objectives 1 1.2 Service Robots 2 1.3 Incentive and Motivation for Developing "MROLR" 2 1.4 Contributions of this thesis: 4 1.5 The Structure of this Thesis 5 1.6 Conclusion 6

Chapter 2 Historical Developments and Related Work - A Synoptic Review 7

2.0 Introduction 7 2.1 Service Robots 7

2.1.1 Apprentice Robots 7 2.1.2 Template-Based Recognition of Pose and Motion Gestures On a Mobile Robot 8 2.1.3 LOLA 9 2.1.4 Jumbo Jet Washing Robots-SKYWASH 9 2.1.5 NOMAD 10 2.1.6 RHINO the tour guide robot 10 2.1.7 HELPMATE 10 2.1.8 JIJO-2 Mobile Robot for Office Services 11

2.2 3D Object Recognition and Pose Determination 11 2.2.2 Modeling 13

2.2.2.1 Geometric Modeling 13 2.2.2.2 Wireframe 13 2.2.2.3 Polygon Approximations 13 2.2.2.4 Generalized Cylinders 13 2.2.2.6 Superquadrics 15 2.2.2.7 Geons 16 2.2.2.8 Multiview Representations 16 2.2.2.9 CAD 17

2.2.3 3D Object Recognition 17 2.2.3.1 Hypothesis Generation and Verification Testing 18 2.2.3.2 Use of hi variants 18 2.2.3.3 Geometric Hashing 18 2.2.3.4 Constrained Search 19 2.2.3.5 Aspect graphs 19

2.2.4 Pose Determination 20 2.2.5 Verification of Object Hypotheses 21

IX

2.2.6 Recognition Strategies 21 2.2.7 Choice of Recognition System for MROLR 22

2.3 Robots with Metric Maps 22 2.3.1 Feature-Based Maps 22

2.3.1.1 AGVs 22 2.3.2 Grid-Based Maps 24

2.3.2.1 ARIEL 24 2.3.2.2 RHINO 25

2.3.3 Robots with Topological Maps 26 2.3.3.1 TOTO 26

2.3.4 Self-Organising Robots 26 2.3.4.1 ALDER and CAIRNGORM 26 2.3.4.2 ALEF 27 2.3.4.3 ALICE 27

2.3.5 Hidden Markov Models 28 2.3.5.1 DERVISH 28 2.3.5.2 XAVIER 28 2.3.5.2 RAMONA 29 2.3.6 Robots with Hybrid Maps 29 2.3.6.1 ELDEN 29 2.3.6.2 MOBOT-rV 30 2.3.6.3 TheDuckett Mobile Robot 31

2.4 Mobile Robots using Vision Based Sensors For Navigation 32

2.5 Conclusion 33

Chapter 3 System Equipment 34

3.0 Introduction 34 3.1 Hardware and related software 34 3.2 Navigation and Recognition Software 37 3.3 Conclusion 37

Chapter 4 Background on Models and Notation 38

4.0 Introduction 38 4.1 Mathematical notation and conventions adopted 38

4.1.1 Vectors, Matrices, and Coordinate Frames 38 4.1.2 Partial Derivatives 39

4.2 Camera, BisightHead, and Mobile-Base Models 39 4.2.1 Camera Model 40 4.2.2 Stereo Head Model and Frames 43

4.2.2.1 Mobile Base Mh and Head-frame CO 45 4.2.2.2 Pan Frame CI 45 4.2.2.4 Left L and Right R Frames 46

4.2.3 Calculation of Image Coordinates from a Known 3D Feature Point 47 4.2.4 Head Angles to Fixate a Known Point 47 4.2.5 3D Point Location fi-om Image Coordinates UL, VL, UR, VR 48 4.2.6 Mobile Base Model 52

4.3 3D Object and Scene Modeling and Recognition 55 4.3.1 Parallel Camera Geometry For Non-Parallel Stereo 56 4.3.2 Modelling and Recognition Software Environment and Associated Tools 57 4.3.2.1 The Calibration Tool 58 4.3.2.2 Stereo Tool 59 4.3.2.3 Edge Tool 59 4.3.2.4 Matcher Tool 60 4.3.2.5 TINA Generic List Structures 61

4.4 Conclusion 61

Chapter 5 Navigation 62

5.0 Introduction 62 5.1 Preliminary Mobile-Base Model Equations Revisited 63 5.2 Landmark Features For Map-Building and Navigation 65 5.3 Feature Acquisition and Fixation 67

5.3.1 Uncertainty of Fixation Measurements 68 5.4 Navigation: Map Building and Localisation 71

5.4.1 Extended Kalman Filtering 71 5.4.2 The Extended Kalman Filter 73

5.5 Localisation and Map-Building 74 5.5.1 Formulation of the State Vector and its Covariance 74 5.5.2 Initialization 74 5.5.3 State Estimation following a Movement 75 5.5.4 Measurement and Feature Search 76 5.5.5 State Vector Update following a Measurement 77 5.5.6 New Feature Initialization 78 5.5.7 Scene-Feature Deletion 79 5.5.8 World-Coordinate Frame Zeroing 80

5.6 Map-Building 81 5.6.1 Distance Increments in Place of Time Increments 82 5.6.2 Advantage of Knovra Features 82 5.6.3 SimpHfied Mobile Base Trajectory Motions 83

5.7 Map-Building Strategy Adopted 84 5.8 Software Development 85

5.8.1 Development and Modification of Navigation Software 85 5.8.1.1 Navigating to Specified Destinations (Waypoints) 85 5.8.1.2 Obstacle Avoidance Function 85 5.8.1.3 Dead Reckoning Option 86

5.9 Data Measurement Examples 90

5.10 Conclusion 91

Chapter 6 Object Modeling, Recognition and Retrieval 92

6.0 Introduction 92 6.1 Automated Model Creation 92

Portable Tables for Model Creation 95 6.2.1 TINA'S Geometric List Structures 98

6.3 Recognition Algorithm 99 6.3.1 Stored Table of {(si.tj), (Si+i,ti+i)} 100 6.3.2 The Matching Process 101

6.4 Automation of Recognition Process 101 6.5 Model Creation (An Alternative Method) 102 6.6 Pose for Object Retrieval 102

6.6.1 An Alternative Method For Object Grasping 103 6.7 Retrieval of the Object 103 6.8 MROLR System Simulator 109 6.9 Conclusion 116

Chapter 7 Testing and Verification 118

7.0 Introduction 118 7.1 Creation of Database Models 118

7.1.1 Storage of Focus Features and Cliques 128 7.2 Tests 129

7.2.1 Processing Times 138 7.2.2 How Reliably can Target-objects be Retrieved? 139

7.3 CD with Video Clips of MROLR Performing 140 7.4 Conclusion 141

Chapter 8 Conclusions and Further Work 142

XI

8.0 Introduction 142 8.1 Main Contributions of This Work 142

8.1.1 Related Contributions 142 8.2 Future Directions and Work 143 8.3 What Follows? 145

Appendix A Early Attempts to Obtain Object Modeling Algorithms 146

A 1.0 Introduction 146 A 1. 1 StructureRecoveryof Objects Using Multiple Camera Views 146 A 1.2 Trinocular Stereo 146

A 1.2. 1 Trinocular Vision (3 Camera Stereo) 147 A 1.2. 2 Image Modeling 148 A 1.2. 3 Image Rectification 150 A 1.2.4 Three Cameras 152 A 1.2.5 3D Reconstruction 153

A 1. 3 Cluster Prototype Centring by Membership (CPCM) 156

Appendix B Hardware and Software 157

B 1.0 Introduction 157 B 2.0 Rotating Table and Mobile Transport Robot Controller Circuit Design 157 B2.1 Model Creating Procedure Details 167

B 2.1.1 Pseudocode: Make_model_proc() 167 B 3.0 Forward Kinematics and Inverse Kinematics of the UMIRTX 168 6 DOF Robot 168

B 3.1 Denavit-Hartenberg Representation (D-H) 168 The AMatricies 169

B 3.2 Forward Kinematic Solution gT 170

B 3.3 Inverse Kinematics 171

B 3.3.1 'C written function to compute inverse kinematic solution for RTX robot arm 173

Appendix C Development of a Feature Tracking Strategy 175

C 1.0 Introduction 175 C 5.5.9 System State Prediction 176 C 5.5.10 Filter Update 176 C 5.5.11 Sensor State and Itself. 180 C 5.5.12 Sensor State and Observed Feature State 180 C 5.5.13 Observed Feature State and Itself 180 C 5.5.14 Sensor State and an Unobserved Feature State 180 C 5.5.15 Observed Feature State and an Unobserved Feature State 181 C 5.5.16 And finally Two Unobserved Feature States 181

C5.6 Multiple Steps 181 Bibliography 190

Xll

Table of Figures

Fig 2.1.1.1 Apprentice robot 8

Fig 2.1.8.1. 1 JIJO-2 mobile robot for office services 11

Fig 2.2.2. 1 Generalized cylinders [Bin87] 14 Fig 2.2.2. 2 Octrees 14 Fig 2.2.2. 3 Octrees block appearance 15 Fig 2.2.2. 4 Models produced from Superquads 15 Fig 2.2.2. 5 Example of Geons 16

Fig 2.2.3. 1 Three characteristic views of a cylinder (Aspects) 19

Fig 2.2.3. 2 Aspect graph of a cylinder 20

Fig 2.3.1. 1 Oxford's GTI vehicle (from [Dav98]) 23

Fig 2.3.2. 1 ARIEL 25

Fig 2.3.2. 2 RHINO (From Univ. Bonn, Germany) 25

Fig 2.3.3. 1 TOTO 26

Fig 2.3.4. 1 ALICE 27

Fig 2.3.5. 1 XAVIER 28 Fig 2.3.5. 2 RAMONA 29 Fig 2.3.6. 1 ELDEN 30 Fig 2.3.6. 2 MOBOT-IV 31 Fig 2.3.6. 3 Duckett mobile robot [DucOO] 31 Fig 3.1. 1 Sketch of MROLR 35 Fig 3.1. 2 Calibration tile 35

Fig 4.1.1. 1 Vector point Pin frame Co 38

Fig 4.2.1. 1 Geometry of pinhole camera model 40

Fig 4.2.2. 1 Stereo Head showing Intrinsic Parameters 44 Fig 4.2.2. 2 Stereo head joint motion reference frames 44

Fig 4.2.5. 1 Mid-point of normal between 2 rays 50

Fig 4.2.6. 1 MROLR moved to position (z,x (j)) 52 Fig4.2.6. 2 Oxford's mobile robot geometry..... 53 Fig 4.2.6. 3 Geometry of a circular arc trajectory 54

Fig 4.3. 1 TINA's main interface window 56

Fig 4.3.1. 1 Convergent to rectified stereo re-projection 57

Fig 4.3.2. 1 Calibration of stereo cameras using planar tile 60

xm

Fig 5.2. 1 Scene feature as seen from two viewpoints 67

Fig 5.3.1. 1 Uncertainty of fixation measurements 68 Fig 5.3.1. 2 Converging stereo cameras showing potential measurement errors in X and Z 70 Fig 5.3.1. 3 Fixation error estimations for I=0.25m and vergence uncertainties of .005 rad 71

Fig 5.6.2. 1 Example of saved known features (a) Light fitting ends and (b) Fire extinguisher sign 83

Fig 5.8.1. 1 Head control GUI 86 Fig 5.8.1. 2 Navigation GUI 87 Fig 5.8.1. 3 Waypoint browser GUI 87 Fig 5.8.1. 4 New waypoint creation for obstacle avoidance 88 Fig 5.8.1. 5 Flow chart illustrating SLAM or dead reckoning modes 89

Fig 6.1. 1 Features remain static w.r.t object's frame 93 Fig 6.1. 2 Models created on computerised rotating table 93 Fig 6.1. 3 3D created model of (a) cup and (b) cube 95 Fig 6.1. 4 (a) Stereo images of scene (b) 3D scene model 95 Fig 6.1. 5 Portable model creation tables 96

Fig 6.2. 1 Apphcation of model editor to a raw 3D cube model 97

Fig 6.2. 2 TINA 3D geometric list structures 98

Fig 6.3. 1 Matching of line segments 99

Fig 6.4. 1 Model Cube and Cup scaled, and placed over objects in the scene 102

Fig 6.7. 1 Relative position and orientation of head and arm coordinate frames remain fixed 107 Fig 6.7. 2 Partially completed transport mobile base 108 Fig 6.8. 1 Gripper normals (brown) non aligned with object surface normals (blue) 110 Fig 6.8. 2 Several simulated views of the mobile base navigating to waypoints 111 Fig 6.8. 3 Examples of auto-feature selection and covariance outputs from simulator corresponding to Fig

6.8.1(b) 112 Fig 6.8. 4 Robot arm at table with target object (small teapot) 113 Fig 6.8. 5 Arm with gripper over target object 113 Fig 6.8. 6 Gripper closes and slips over surface of object towards handle in search of more appropriate grasping

location 114 Fig 6.8. 7 Target being raised 114 Fig 6.8. 8 Arm showing joint movements 115 Fig 6.8. 9 Arm showing joint movements 115 Fig 6.8. 10 Arm moving away from table with retrieved red sphere 116

Fig 7.1. 1 Object model creation 119 Fig 7.1. 2 (a) 3 Stereo views of "Large Cup", (b) Model of "Large Cup" 120 Fig 7.1. 3 (a) Single stereo view of "Vice", (b) Model of "Vice" 121 Fig 7.1. 4 (a) Single stereo view of "Shaver", (b) Model of "Shaver" 121 Fig 7.1. 5 (a) Single stereo view of "Boti:le", (b) Model of "Bottle" 122 Fig 7.1. 6 (a) Single stereo view of "Can", (b) Model of "Can" 123 Fig 7.1. 7 (a) Single stereo view of "Cube", (b) Model of "Cube" 124 Fig 7.1. 8 (a) Single stereo view of "Hole Punch", (b) Model of "Hole Punch" 125 Fig 7.1. 9 (a) Single stereo view of "Stapler", (b) Model of "Stapler" 126 Fig 7.1. 10 (a) Single stereo view of "Drink Package", (b) Model of "Drink Package" 127 Fig 7.1. 11 (a) Single stereo view of "Cup", (b) Model of "Cup" 128 Fig 7.1. 12 Table of focus feature and cliques 128

xiv

Fig 7.2. 1 (a), (b) and (c) Several views of MROLR navigating towards specified waypoint, "the Table" 131 Fig 7.2. 2 Lock on achieved for "Fire Extinguisher " sign feature patch 131 Fig 7.2. 3 Two additional mapped features (a) light fitting edge (b) clock face, intersection of hands 132 Fig 7.2. 4 Hole-Punch search 133 Fig 7.2. 5 Pose information of "Hole Punch" 134 Fig 7.2. 6 Left image of "Bottle" in scene 134 Fig 7.2. 7 3D models of "Bottle" and "Cup" 134 Fig 7.2. 8 (a) 3D model of "Bottle" in scene (b) Recognised "Bottle" transported into scene 135 Fig 7.2. 9 (a) Left image of "Cup" in scene, (b) 3D model of "Cup" in scene, (c) Recognised "Cup" transported

into scene 136 Fig 7.2. 10 Cup transforms for transportation into scene and grasping 137 Fig 7.2. 11 (a) Head gazing at scene (b) Docked robot arm 137 Fig 7.2. 12 Robot retrieving located objects 138

FigA 1.2.1 Trinocular stereo: corresponding matches at intersection of epipolar lines 147

Fig A 1.2. 2 Epipolar lines and corresponding matches following image rectification 148

FigA 1.2.2. 1 Standard pin hole camera, optical centre C, image point I focal plane P 148

Fig A 1.2.3. 1 Rectification of two images 150

FigA 1.2.5. 1 4Degreeof freedom stereo head with third camera mounted 154 Fig A 1.2.5. 2 Left, centie and right camera images of a scene 155 FigA 1.2.5. 3 Resulting 3D trinocular reconstruction 155 Fig B.2.0. 1 Portable motorised turn tables 159 Fig B.2.0. 2 Mobile Transport base and Arm 159 Fig B.2.0. 3 Contt-oller hardware 160 Fig B.2.0. 4 MC68HC11 Microprocessor and interface circuit 161 Fig B.2.0. 5 Flash PSD memory circuit 162 Fig B.2.0. 6 Motion controller circuit, utilises LM629 motion controllers 163 Fig B.2.0. 7 RS232 Serial communication interface circuit 164 Fig B.2.0. 8 Power and ports circiut 165 Fig B.2.0. 9 Motor driver circuit 166

Fig B 3.0. 1 DH link coordinate systems and joint assignment for the RTIOO robot arm 168 Fig B 3.1. 2 Denevit Hartenberg link/joint parameter table 169

XV

List of Notation Note the symbols listed here are in a logical rather than an alphabetical sequence

Camera, Head and Associated Symbols

dy V (del) operator examples V(y)x=^, V(f J „ =

ox

pW

p''

m' ^P^

u,v

Uo,Vo

(Xc, Yc,

f

Ku , Ky

C

C'

f)

Vz(t + At),^ Vz(t + At),^

Vx(t + At)^^ S/x(t + At)^^

W(p(t + At)^^ V(p(t + At),^

3D vector point comprising coordinate components Px,Py,Pz

3D vector point in frame W (world ref. frame)

3D image vector point in camera frame F (coordinates (xcYcfy)

3D vector point in camera frame F

2D image coordinates in pixels

2D coordinates of image centre in pixels

3D coordinates of image point in camera frame

camera effective focal length

number of horizontal and vertical pixels/m respectively

camera matrix

inverse camera of matrix

a head pan angle

YL , YR head left & right vergence angle

e head tilt angle

Co, CO head coordinate frame

L, R left & right frames respectively, also used for camera & head vergence frames

C2 head tilt frame

XVI

Ci head pan frame

mR, mL 3D scene point in right and left camera frames respectively

m^^,m^ 3D scene point in head frame

CR, CL offset vector along respective vergence axis, between the intersections with the tilt axis and camera optic axis

DR HL offset vector along respective optic axis between camera optic centre and intersection with vergence axis

PL PR horizontal offset vector (along tilt axis) between pan and tilt axes intersection, to respective vergence axis

Mh vertical offset vector (along pan axis) between mobile base centre and head frame origin

I horizontal scalar distance between camera optic centres (i.e., inter-ocular distance)

p scalar horizontal offset distance (along tilt axis) between pan and tilt axes intersection, to respective vergence axis

c scalar offset distance along respective vergence axis, between the intersections with the tilt axis and camera optic axis

n scalar offset distance along respective optic axis between camera optic centre and intersection with vergence axis

Mh scalar vertical offset distance (along pan axis) between mobile base centre and head frame origin

P angle difference between m and niorig

R transform rotation matrix between frames Cl and CO

R transform rotation matrix between frames C2 and Cl

R transform rotation matrix between frames L and C2

R"^^^ transform rotation matrix between frames R and C2

R " ' ^ inverse (R'^*^^)

r,ri,r2 ray vector (ri = a+Xb, rj = e+X,d )

XVll

a,c

b,d

X,p

u

vector point on ray

vectors in direction of ray

scale factor

imit vector

{u} column vector

[ u ] skew-symmetric matrix of u

Mobil Base, Landmark Features and Associated Symbols

V centre vehicle velocity

R arc radius of movement

V,, V2 linear driven wheel velocities

0 angular rotation (arc angle)

D, Dl, D2 axial wheel offset distances from centre of base

X, x(t)

z, z(t)

z(t+At)

Z operator

gx» gy

mobile base x coordinate in world reference frame

mobile base z coordinate in world reference frame

mobile base orientation relative to world reference frame (x axis)

'z(t)^

x(t) Xv =

Kmj

robot's estimated new z coordinate position following a constant time step increment At

Shi and Tomasi operator. patch

^ X ^ X ^ y

S y S X y J

horizontal, vertical gradients of image pixel intensities

-th n eigenvalue

xvm

N normalised sum-of-squared-difference measure. Used for feature

gi - gi go -go matching. N = —( ^ " patch \_ <^1 ^ 0

match.

j . N is zero for a perfect

go'gi' ^0 ^i ' '^ means and standard deviations of the intensity across the patches 0 and 1 respectively, n is the number of pixels in a patch

s angular fixation error

'^Object standard homogeneous 4x4 transform relating the rotation and translation

of an object frame wrt Co (the head) frame

[n 0 a p] the column vector elements of the homogeneous transform T = [n 0 a p]: n is the normal vector, o is the orientation vector a the approach vector p the translation vector.

Extended Kalman Filter and Associated Symbols

f state transition fimction: The state transition fimction is a function of x andu

fv vehicle estimated position vector (fimction of Xy and u )

At step time interval

z measurement vector, z is used to designate an actual measurement (i.e., using the active stereo head)

CO

m^i , m^f measiu-ement vector of a scene-feattu-e in head frame moi scalar length of vector moi

mi measurement vector of i* scene-feature in world frame, transformed into angular coordinates

nii(xv,yi) emphasizing measurement vector mi is a fimction of Xy and yi

mcip projection of m^j. onto xz plane map =; ^J(mGix'^ + mGu^

m, m(x) prediction of the measurement z, m(x) = mi(Xy,yi)

X system state vector

xix

X estimateof system state vector

X (k+l/k) current estimation of system state vector (estimate of x at a time step k + 1 based on an estimate at time, step k and an observation (measurement) made at time step k + 1. At any given time, estimates of mobile-robot and scene-feature locations (in the world reference coordinate frame) are stored in the system state vector \

P 3(n+l)x3(n+l) covariance symmetric matrix, representing the uncertainty in X. n = number of known features

p 3x3 covariance matrix of the estimated mobile robot state x„ and itself.

This matrix is a partition of P

P,„ 3x3 covariance matrix between the estimated mobile robot state x„ and

feature y , . This matrix is a partition of P

Pv.y 3x3 covariance matrix between the estimated feature state y.and

feature y^. This matrix is a partition of P

Pv.v 3x3 covariance matrix between the estimated feature state y,and itself.

This matrix is a partition of P

Q noise, provides a means of allowing for random or unaccounted effects in the dynamic model.

V innovation, v is the difference between the actual measurement z and the predicted measurement m

R covariance matrix of the noise This matrix represents the covariance of the

noise in the measurement

R scalar noise variance measurement (Aa^, Ae or Ay )

RL covariance matrix of the noise transformed into cartesian measurement

space S innovation covariance, represents the uncertainty in a measurement (i.e.,

the amount by which an actual measurement differs from its predicted value).

SmGi innovation covariance of measurement associated with mci

XX

W Kalman gain

Xv mobile-robots position

X V mobile-robots position estimate

yi 3D location of the /th feature

y ,• estimated 3D location of the t^ feature

Q covariance matrix , expressing the uncertainty in fy

U covariance matrix of control vector u, and a^ , o-„ are the 1 1

a^^, a^^ standard deviation estimates ofthe errors in the velocity control inputs Vl and V2

v j ' V2

^u,^

\^LJ UL covariance matrix of image vector UL =

Vs scalar space volume measure

Object Modelling, Recognition and Associated Symbols

(li, ij+i, Ci, mi) quadruple representing 3D line segment i, that is the two end points li and li+i, the direction vector between them ei ,and centroid ofthe line, its midpoint mi

a orientation difference between two line segments

h vector distance between two line segments

h scalar distance between two line segments h = hd

d the unit vector normal between two line segments

Si scalar distance from the point of minimum distance between two line segments (i, i+1), to the start ofthe segment i, measured parallel to segment i.

tj scalar distance from the point of minimum distance between two line segments (i, i+1), to the end ofthe segment i, measured parallel to segment i.

XXI

qi scalar distance from the point of minimum distance between two line segments (i, i+1), to the centre ofthe segment i, measured parallel to segment i.

Pi permissible scalar error values

mi+i' mj+i vector modified by the addition of -h to make it coplanar with mi

0i permissible angular orientation error values

Chapter 1 About This Thesis 1

Chapter 1 About This Thesis

1.0 Introduction

The work presented in this thesis comprises the development and analysis of a mobile robotic system capable of visually locating specific 3D objects in an indoor enviroimient and transporting them to a given location. Its classification would fit into the category of "service robots" and it has been named "MROLR", A Mobile Robotic Object Locator and Retriever.

1.1 Thesis Objectives

The main objectives of this thesis are to:

• gain a theoretical understanding ofthe problems relating to visually locating, identifying, and physically retrieving specified 3D objects in a knovm indoor environment, utilising mobile robotic platforms

• develop feasible solutions for the implementation of a working system • evaluate implemented solutions

Consider the requirements of a robotic system that can be given instructions to fetch a specified object in the current environment, at present confined to be indoors. If the object is visible this is of course a relatively simple task for a human, but still a difficult problem for a robot. Practically the problems involved can be identified as:

• describing the target object to the robot • autonomously navigating in the environment (including avoiding obstacles) • visually being able to recognise the object • determining the object's pose (that is its position and orientation) • grasping the object • remaining both within financial and time constraints

The robotics/machine vision discipline is now relatively mature and a significant number of authors have developed systems that can perform several of the above tasks, i.e. autonomous navigation, object recognition and pose determination, and visually guided object grasping by a robot arm. Many of the authors and systems are reviewed or mentioned in the following chapters.

Putting together the desired system by building on the work of others, on the surface at least, appears quite sfraightforward, merely a matter of integrating and supplementing successfully proven concepts. Choosing from available leamed publications nevertheless poses dilemmas. Authors invariably claim improvements in one or more aspects of their work over previous attempts by others, claims that often are difficult to substantiate. Furthermore, interpreting their work, implementation, and finally reaching a satisfactory degree of operation, can prove difficult. By and large the above steps constitute a substantial part of this work.

Chapter 1 About This Thesis

Acknowledgement of the contributions made by others to this work is given in the various chapters where they occur.

The incentive and motivation for developing "MROLR" and the contributions made by this work together with an outline ofthe thesis structure is given in sections 1.3,1.4 and 1.5 respectively. A brief survey of service robots is given below.

1.2 Service Robots

Historically robots were designed predominantly for use in factories for ptirposes such as manufacturing and transportation, but advances in technology have widened their applications, enabling them to automate many tasks in non-manufacturing sectors such as agriculture, construction, health care, retailing and other services. Broadly these systems, including the one being developed in this work, may be classified as "service robots", and fall into categories such as:

Cleaning, Lawn mowing, and Housekeeping Robots Entertaining Robots Humanoid and Apprentice Robots Rehabilitation Robots Humanitarian Mine Disarming Robots Medical Robots Agricultural and Harvesting Robots Sheep Shearing Robots Surveillance Robots Inspection Robots Construction Robots Automatic Refilling Robots Fire Fighting and Rescue Robots Robots in Food Industry Tour Guides and Office Robots Flying Robots Space Exploration and Terrain Mapping Robots

The list of service robots, v^th ever improving abilities, is growing at an impressive rate.

1.3 Incentive and l\/lotivation for Developing "MROLR"

As implied above, the catalogue of service robots is growing rapidly and it is anticipated that in the near future autonomous systems, providing safe, reliable navigation, search, and object recognition and retrieval abilities will join the list. The development of such a system was an incentive for commencing this work and has led to the creation of MROLR. MROLR is a navigating mobile robotic system capable of searching for, locating (pose determining), recognising, and retrieving 3D objects in an indoor environment. The system comprises a mobile robot transporting a 4 DOF stereo head, equipped with 2 calibrated charge couple device (CCD)

Chapter 1 About This Thesis

video cameras, and a docking robot arm. The principal navigation mode incorporates Simultaneous Localisation and Map-building (SLAM) that uses an Extended Kalman Filter (EKF) to update and maintain a map of the robot's location and multiple navigational-feature positions as it steers towards specified coordinates. Navigation-features comprise either of known or newly acquired visible features. Knovm features are a priori stored while newly obtained features are acquired while fracking. During navigation, automatic navigation-feature selection and measurement is performed. The results of these measurements in combination with odometry readings are used to obtain good robot and feature position estimates. Limited obstacle detection and circiramavigation (avoidance) is incorporated.

Surveying ofthe scene for objects to be recognised occurs at specified locations termed waypoint stations. At waypoints associated with object locations (i.e., tables laden with objects), the head's gaze is lowered and it sweeps through a pan of -20 to +20 degrees in search of the desired object. At present (while panning for scene-objects) the tilt and vergence angles remain fixed during this sweep. This ensures the cameras' motion (and consequently the structure-from-motion) is consistent with their calibration. It is proposed to vary the tilt and vergence angles in a sequence of movements in the future for a variety of different calibrated camera motions.

The recognition system utilises a database of object-models from which sought objects are selected and subsequently matched against scene-models. Database stored object-models are individually constructed at a prior time. Object-model creation comprises the integration of a sequence of 3D models constructed from stereo image pairs, each representative of the object in varying angular positions. The object-model of an object of interest to be located in a scene is retrieved from a database and matched against scene-models until recognition and pose is established. Scene-models, unlike object-models are formed during searching, while panning the scene. Both object-models and scene-models comprise 3D straight line and conic segments. Recognition is based on verifying the existence of mutual groups of 3D line and or conic edge features in both the scene and model object. Where the object has sufficient distinctive features, recognition is view independent and tolerant to both scale variations and occlusions. On finding the object, a 6 DOF robot arm attached to a caster platform is manually docked with the mobile robot. Using the object's pose transform the arm is able to grasp and place the object on the mobile base. The arm is manually de-coupled from the mobile robot and the object transported back to the home position. In a future version of the MROLR, the arm will be mounted on a mobile transport robot, that will be signalled via a wireless modem, and on receiving this command navigate to and dock with the active vision base. The construction of this "transport robot" (at the time of this writing) is near completion (Figs 6.7.2). While it would be possible to mount the head and arm on one mobile base, the intention here is to ultimately have a light weight fast moving "scout robot", and a slower heavy-duty "transport robot". The scout would map and model the environment, and seek and find target objects using vision. When a target object was located, then its positional information and path directions would be relayed to the transport robot which would proceed with the task of object retrieval and transport.

Motivation for this work also stems from the desire to ultimately develop a system capable of rapid and reliable retrieval of objects in both domestic and industrial enviromnents. It is believed

Chapter 1 About This Thesis 4

that potential applications for such an improved system range from simple domestic and industrial "robotic aids", to "assistants" for the severely physically disabled or visually impaired.

A prolific collection of literature exists on each of the fields of 3D object recognition, pose determination, mobile robot navigation, and robot arm kinematics relating to object grasping. Each of these comprises significant areas of research and development for the machine vision and robotics fraternity. The merging of both related and unrelated scientific endeavors provide a wealth of design building blocks and opportunities for creative mechatronic engineering applications. The emergence of service robots is such an example.

1.4 Contributions of this thesis:

The major contribution made by this thesis is in the design and implementation of "MROLR", a new type of "service robot", comprising a navigating mobile robotic platform system with active vision and retrieval abilities.

Related contributions stem from: • Development of a 3D object-model creating facility (design and construction of a motor

driven turntable, and related software). • Development of a model editor to facilitate the manual removal and/or addition of 3D

segments in formed models. • Extension and modification of existing recognition algorithms to facilitate automated

recognition while searching and scanning the scene. • Extension and modification of existing active vision based navigation software to work with

the designed mobile base. This included the writing of an obstacle detection and avoidance module.

• Development of kinematic solutions from object pose information, to enable the robot arm to grasp and retrieve the desired object.

• Preliminary design and construction of a "transport" robot base with moimted robot arm, and related software.

• Preliminary implementation of a "3D Virtual Environment" for simulating MROLR, that could be useful for evaluating altemative map-building strategies, object grasping points, as well as for demonstration and educational purposes.

As is common with many engineering projects, a substantial amount of ancillary software and electronic hardware is required and inevitably must be tailor designed and built to meet special requirements. An example is the software to enable handshaking and the exchange of both video and numerical data between the various intercoimected PC platforms, and hardware devices, comprising MROLR. These include the mobile base, active 4 DOF head, robot-arm, and frame-grabber, and facilitated by the establishment of a local area network. Hardware examples include electronic boards required to drive rotating or moving items such as the computer controlled model creating facility, and "transport" mobile base.

Much of the work carried out in this project was greatly assisted by the generous co-operation and advice given by colleagues acknowledged on page v, and throughout this thesis.

Chapter 1 About This Thesis 5

1.5 The Structure of this Thesis

Chapter 2 commences with a review of historical developments and work related to this research. It is primarily a synoptic review. The areas covered include service robots, 3D object-recognition, pose determination, and mobile robots with metric maps.

Chapter 3 describes the system equipment comprising MROLR. Both hardware and software requirements are examined. Specific details such as hardware circuits designed and built, and software driver modules written, are given in Appendix B.

Chapter 4 provides detailed background information on mathematical notation and concepts used throughout this thesis. A variety of models that form the preliminary analysis to following chapters are introduced. Models included are for the camera, stereo head, and mobile base. Algorithms and tools used for object and scene-modeling are also introduced.

Chapter 5 develops algorithms for autonomous navigation from an initial coordinate frame. Navigational features facilitated through active stereo vision and odometry include simultaneous map building and localisation, and limited obstacle avoidance. The algorithms are based on and build upon those of Davison [Dav98], [DavOl] and use an Extended Kahnan Filter with maintenance of fiill covariance knowledge between sets of map-features.

Chapter 6 comprises of four sections, the first outiines the 3D object-model producing facility developed to provide an object-model data base of objects that MROLR could be requested to locate and retrieve in the fixture. These objects would be searched for at designated stopover points (waypoints) along navigational routes. The facility also includes a 3D object-model editor.

The second section outlines the development ofthe object recognition module that is responsible for (a) matching requested target objects with objects in a scene, (b) Obtaining appropriate transforms to graphically fransport the object-model into the scene, after successful matches.

The third section establishes the transforms necessary to guide the robot arm to grasp and retrieve the located object. Grasping locations are contained within the data provided for each created object-model (i.e., handle of a cup, etc.).

The Final section outiines the implementation of a "3D Virtual Environment" for simulating MROLR.

Chapter 7 Specific tests performed and results obtained, are described in this chapter. It is considered that these verify the methodology of this research. Tests include navigating to a variety of specified waypoints and recognising and retrieving requested target objects. During navigation, self-localisation map building and acquisition and storing of navigational features (to become "known-feature" aids for future use) are also performed.

Chapter 8, this final chapter concludes by summarizing the main contributions of tiiis work and outlines envisaged futiu-e directions and extensions.

Chapter 1 About This Thesis 6

Appendix A describes some early approaches and methods evaluated, but ultimately discarded by the author in the process of obtaining an effective recognition module for MROLR.

Appendix B A substantial amount of ancillary software and electronic hardware was designed and developed for MROLR. Appendix B contains additional specific details relating to these designs. This is included for clarification and extension of information provided in various chapters.

Appendix C continues with the algorithmic development for continuous tracking of multiple features using active vision commenced in Chapter 5. It concludes with the "Vs" criterion tiiat provides direction as to which navigational landmark feature is optimal to next track.

Finally a comprehensive Bibliography is provided. This is a collection of references maintained and used by the author.

1.6 Conclusion

This introductory chapter has detailed the objectives, incentive and motivation, and contributions of this work. It concludes with a summary description ofthe content of each chapter that follows.

Chapter 2 Historical Developments and Related Work- A Synoptic Review

Chapter 2 Historical Developments and Related Work - A Synoptic Review

2.0 Introduction

Modem robot designs and applications extend well beyond the requirements of the manufacturing sector. Service robots are being utilised as aids and for tasks hither too unimaginable by most. MROLR is classified as a "service robot", its object recognition and pose determining modules are "model based", while navigation, a combination of landmark mapping (for route planning and localisation) and odometry based. Significant progress and advances in these fields have taken place over the past three decades, spurred undoubtedly by the quest for machines with greater autonomy. A vast quantity of literature relating to these developments has also emerged, with much of it now readily accessible via the Intemet. A synoptic review of associated historical developments and related work is given in the following sections of this chapter.

Section reviews are organised as follows:

• 2.1 Service robots and applications • 2.2 3D object recognition and pose determination • 2.3 Robots utilising metric maps (several of these are service robots) • 2.4 Mobile robots using vision sensors to aid navigation

2.1 Service Robots

2.1.1 Apprentice Robots

Moderate success has been achieved in getting robot apprentices to leam 3D motions and tasks by visual observations (see for example Bentivegna et al [BA02], Atkeson and Schaal [AS97]). Wingate and Stoica [WS96], [WS97] developed a fuzzy-neural robot arm controller, based on the composition of triangular norms and co-norms (S-T norms). The system utilised 2 cameras to enable the robot to view (a) the motion ofthe Master's arm (human arm, or robot teacher's arm), and (b) the motion of its own arm (Apprentice's arm). Two leaming modes were developed. In the first the robot apprentice looked at the Master's arm and its own altemately, and attempted on each viewing to minimise position error variables. In the second mode, only the Master's arm is watched. The robot being free to issue arm joint commands more or less at random, such that its arm took up a variety of postures. For each position the Master attempts to place his arm in a corresponding posture to that of the robot. When arm positions are sufficientiy similar, the robot is advised, and a process of validation takes place. Association between images of the Master's arm, and joint motor commands that the apprentice robot gives to position its ovm arm (to arrive at similar postures) are leamt, and subsequentiy used as part of the apprentice robot's command set.

Chapter 2 Historical Developments and Related Work- A Synoptic Review

Apprentice robot leaming (a) from human Master's arm and (b) A Master (pre-programmed) robot.

Fig 2.1.1.1 Apprentice robot

As discussed in Chapter 8, under "Future Directions and Work", it is intended to revisit this work with a view to adding gesture recognition capabilities to MROLR.

2.1.2 Template-Based Recognition of Pose and Motion Gestures On a Mobile Robot

Waldherr et al [WTRM98] produced a vision-based interface that has been designed to instruct a mobile robot through both pose and motion gestures. An adaptive dual-colour algorithm enables the robot to track and, if required, follow a person around at speeds of up to one foot per second while avoiding collisions with obstacles. This tracking algorithm quickly adapts to different lighting conditions. Gestures are recognised by a real-time template-matching algorithm. This algorithm works in two phases, one that recognises static arm poses, and another that recognises gestures (pose and motion). In the first phase, the algorithm computes the correlation of the image with a pre-recorded set of example poses (called: pose templates). In the second phase, the results of the first phase are temporally matched to previously recorded examples of gestures (called: gesture templates), using the Viterbi algorithm Rabiner & Juang [RJ86]. The gesture template matcher can recognise both pose and motion gestures. The result is a stream of probability distribution over the set of all gestures, which is then thresholded and passed on to the robot's high-level controller. This approach has been integrated into their existing robot navigation and confrol software, where it enables human operators to:

• provide direct motion commands (e.g., stopping)

Chapter 2 Historical Developments and Related Work- A Synoptic Review 9

• guide the robot to places which it can memorise • point to objects (e.g., rubbish on the floor) • initiate clean-up tasks, where the robot searches for rubbish, picks it up, and delivers it to

the nearest wastebasket.

2.1.3 LOLA

LOLA: Visual Object Detection and Retrieval, is a project being undertaken at the Centie for Robotics and Intelligent Machines NC State University, USA, and commenced in 1995 [LOLA95]. The project is aimed at the producing an indoor mobile robot with a high degree of autonomy and the ability to locate and retrieve objects. Features include multi-visual integration and data fusion, navigation, and real-time con^uter architectures.

The recognition and detection algorithm utilises object features of colour and shape obtained from images generated from an onboard main camera. Processing is as follows: once correct colour is detected, a second camera is used to determine the shape of the object by analysing the distortion of a projected grid emitted by an infrared laser. The robot is equipped with a four degree of freedom arm and gripper, and if the colour and shape match required parameters, the robot will pick up the object and deliver it to a predetermined location. The hardware platform consists of a Nomad 200 robot-base and turret, from Nomadic Inc., and an Intel 80486 processor.

The objectives ofthe LOLA project are in many respects similar to those aspired to by the work of this thesis. Significant differences however exist in the navigation and recognition paradigms used.

2.1.4 Jumbo Jet Washing Robots-SKYWASH

In a joint venture with AEG and Domier, the Fraunhofer Institute IPA, the Putzmeister AG in Aichtal, Germany have developed an aircraft cleaning manipulator titled "SKYWASH". Two SKYWASH robots working cooperatively clean Boeing 747-400 jimibo jets, in approximately 3.5 hours, instead of 9 hours for normal manual washing. Huge cleaning brushes travel a distance of approximately 3.8 kilometies and a surface of around 2,400 m , about 85% of the entire plane's surface area, including the exterior of its engines. The main vehicle is a reinforced chassis made by Mercedes Benz and is equipped with a 380 horsepower diesel engine. Four supporting legs and a manipulator arm of 33 meters in length and 22 tons in weight are mounted on the chassis. The arm has five degrees of freedom (DOF) excluding the attached revolving brush. All the subsystems required for its operation are fransported on board including the main computer which is designed as an interactive man-machine interface for effective communications with the operator, sensors, and washing fluid controls. A compact disc read­only memory (CD-R), contains aircraft-specific geometrical data. A 3D camera accurately positions the mobile robot as it travels around the aircraft. The purchase price is around 5 billion German Marks. It is claimed this is amortised vsdthin a few years.

Chapter 2 Historical Developments and Related Work-A Synoptic Review 10

2.1.5 NOMAD

The planet rover "NOMAD" (a NASA project of 1997) was developed by CMU, its function to explore the landscape to test new ways of communication and control technologies. Weighing 550 kg, it was supplied witii a 4-wheel drive enabling it to tum on the spot, and fitted with a chassis that could alter its tracks and wheel base to adjust to any kind of terrain. Visual tiansfer was enabled with an innovative "panospheric camera" supplying high quality pictures at an extiemely wide angle. Distorted images are corrected v^th specially developed software modules. Exact locating and positioning was carried out using a DGPS system (Differential Global Positioning System) which determined the position of the Nomad as accurately as 20 cm from its actual position in space. Sensed obstacles were recorded and plotted on a digital map. Travelling speed was limited to 0.2 meters per second. The rover completed its first mission in 1997 in Chile.

2.1.6 RHINO the tour guide robot

The Artificial Intelligence group at the Institute for Information Technology III, at Bonn University developed Rhino, an autonomously navigating mobile robot capable of interacting with people and performing duties. RHINO is based on the mobile platform B21, (US manufacturer Real World Interface in Jaffrey, NH). It is equipped with 56 infrared sensors reacting to touch. Two 2-D laser scanners and 24 ultrasound sensors, and a stereo colour camera system, provides measurements to generate maps of the surroundings. It is able to explore and map its surroundings. Path plarming is based on acquired maps. In 1997, Rhino guided some 2,000 visitors around an exhibition held in the German Museum of Boim.

2.1.7 HELPMATE

A landmark service robot is "HELPMATE". While HELPMATE has been deployed at numerous hospitals throughout the world (King and Weinman [KW90]), it does not interact with people other than by avoiding them. HELPMATE finds its way by using a map of tiie facility that is stored in its memory. The map is created using AutoCad and contains information about halls, elevators, doors, and stations and is usually limited, in definition, to the areas where the robot will travel. The robot uses this map to determine the exact route that is required to navigate from station to station.

A number of ulfrasonic transducers and a camera based vision system help the robot to detect obstacles. Using a spread spectrum radio link, HELPMATE controls elevators and door openers, but it is also possible to use this link for a remote control ofthe robot.

Typical applications for the robot are: • pickup and delivery of interoffice mail, medical records and x-ray images • lab sample retrieval • pharmaceutical delivery • delivery of food plates and sterile supplies

Chapter 2 Historical Developments and Related Work- A Synoptic Review 11

2.1.8 JIJO-2 Mobile Robot for Office Services

JUO-2 is an office robot currently being developed by at the Elecfro-technical Laboratory Tsukba, Japan (Matsui et al [MAFMAKH099]). fts purpose is to provide office services, such as answering queries about people's location, route guidance, and delivery tasks. Particular objectives of the project include leaming abilities, face recognition, and natural spoken conversation with the office dwellers.

Fig 2.1.8.1.1

www. etl.go .jp/~7440/

JIJO-2 mobile robot for office services

2.2 3D Object Recognition and Pose Determination

Numerous authors have carried out detailed reviews of the host of different recognition techniques that exist (e.g. Binford [Bin82]. Besl and Jain [BJ85], Grimson, [Gri90a ], Jain and Flynn [JF93 ], Reiss [Rei93], Weiss [Wei93], Andersson et al [ANE93], Arman and Aggarwal [AA93], Koschan [Kos93], Pope [Pop94], Rothwell [Rot96], Ulhnan [U1196], Forsyth and Ponce [FP02]). In light of these, only a selection of model-based recognition systems are reviewed in the section below. In the subsequent sections, a variety of existing methods of object modeling, recognition, pose determination, and hypothesis verification are described. Some of the descriptions below are summaries of sections ofthe references mentioned above.

2.2.1 3D Recognition Systems.

Perhaps the first model based recognition system was produced by Roberts [Rob66]. The system used line segments and comers to recognise polyhedral objects and was based on a prediction and verification sfrategy. A follow up system was later produced, Guzman [Guz79]. In the 1970's interest focused on feature based techniques as in the prominent work of Waltz [Wal72]. In 1981 Brooks [Bro82] developed the ACRONYM system, which is regarded a landmark in model based vision. Using generalised cylinders as basic primitives for the different parts of the object models, a variety of objects (including articulated objects) could be modeled and

Chapter 2 Historical Developments and Related Work-A Synoptic Review 12

identified. Invariant and semi invariants from object models were calculated and matched to edge stmctures, ribbons and ellipses, in 2D images. Geometric reasoning and constraint manipulation were used to identify and position objects. Exan^les of objects recognised were aeroplanes (in an airfield) and engine assembUes. Limitations of ACRONYM were that time overheads were large and also that recognition of objects, such as aeroplanes, approximated a two-dimensional task.

SCERPO consti icted by Lowe [Low87] used 2D line segments for die recognition of 3D objects. Perceptual organization was used to initially group lines according to proximity, parallelism and colinearity. Ultimately these grouped lines were matched to similar structures. The implementation used 3D models and 2D images. In the 3DP0 system by BoUes et al. [BHH84] range data was used, from which edges are exti-acted. These can be used to extract ellipses for example. The object models used were created using an extended version of a non­commercial Computer Aided Design (CAD) system. Convenientiy making it possible to use the system in connection with different available CAD modelers.

An edge-based system, which uses sparse visual data, titied TINA, was developed at Sheffield Univ. [PPPMF88]. Primitives used for recognition are 3D line segments and ellipses. A feature of this system is the possibility to match several 3D descriptions (views) of one scene to each other. In principle, this facility could be used to produce 3D models by combining multiple view matches. Faugeras and his co-workers [FH86], [Fau92], [MF92], [ZF92], [Fau93] also demonstrated similar work.

Range data systems using surface information, have also been developed, for example IMAGINE I [Fis89]. For this, 3D surfaces and boimdary information are used. Objects recognised include robot arm assemblies, rubbish cans etc. A feature of IMAGINE I is that both articulated and occluded objects can be identified. A significant drawback however was that segmentation of surfaces was carried out by hand.

An object recognition system not requiring the use of a stored model was developed by Fan et al [FMN88]. This system relies on a number of views of an object to be first matched, then at a later time, an object may be recognised by matching it to these views.

Researchers have also considered CAD based computer vision. A system developed by Flynn and Jain [FJ91] also based on range images and surfaces, and obtains model information extracted from CAD models. This system unlike the 3DP0 system utilises a commercial CAD modeler.

A novel system was developed by Dickinson [Dic91] who describes objects using geons (Biedman [Bie85]). By looking at the geons from different directions aspects are produced. Unlike most systems described, direct metric information is not used for recognition, instead statistics and graph structures consisting of 2D closed contours, obtained from the image aspect geons objects, are utilised.

Another approach relying on view-direction, was developed Caparrelli [Cap99]. The system builds 3D object models from 2D views are collected from a single camera. During training.

Chapter 2 Historical Developments and Related Work-A Synoptic Review 13

images are pre-processed and object views represented in the system's memory using pairwise geometric histograms (PGHs). Image boundaries are first polygonized enabling the geometrical relationship between each pair of line segments to form a frequency-histogram that constitutes a feature vector during matching. PGH registers tiie geometric relationship between a line segment and each of the surroimding line segments, which lie within a circular region centred on the reference line, ft is claimed the characteristic makes the PGH technique a local shape descriptor that works robustly in occluded and cluttered scenes.

2.2.2 Modeling

Most machine vision recognition systems rely on model representations of one form or another, hi this section various aspects of object modeling are examined; initially different types of geometrical models are mtroduced, specific descriptions and details follow this.

2.2.2.1 Geometric Modeling

With the infroduction of CAD came the development of a significant number of geometric modelers. Systematic infroductions to CAD modeling can be foimd in Requicha [Req80] and Requicha and Voelcker [RV81], [RV83], who numerate factors such as: Domain, Completeness, Uniqueness, Conciseness, Ease of creation. Efficacy for applications, as important considerations in deciding on a particular modeling methodology. Unfortunately the principal needs of CAD systems are not commensurate with those of machine vision recognition systems, which essentially require that the models consist of primitives that can be identified from an image and that such primitives are easy to exfract from the model. Never-the-less partial CAD model representations have been used effectively in various systems (see section 2.2.2.9). Variations on different ways in which objects have been modeled for computer vision applications are briefly presented below.

2.2.2.2 Wireframe

The Wire frame model is one of the simplest representations and is commonly used where the primitives extracted from the images are edges or line segments. Early systems developed by Roberts [Rob75] and Pollard et al [PPPMF88] used Wire frame representation.

2.2.2.3 Polygon Approximations

Many object surface boundaries can be described or approximated by polyhedrals, and for this reason polyhedral model representation has been of significant interest in computer vision e.g. Lowe [Low91] and Dhome et al [DRLR89].

2.2.2.4 Generalized Cylinders

Generalized cylinders [Bin87] comprise of (2D) surfaces swept along a spine (axis) according to a set of rules. Cubes, cones, cylinders, wedges etc. are some examples. Good representations of many objects can be obtained by combining "generalised cylinders". For example, legs of a human being can be modeled with two cylinders attached at the knee etc. Extensive use has been

Chapter 2 Historical Developments and Related Work- A Synoptic Review 14

made of generalised cylinders. Perhaps the best known application was in the ACRONYM system [Bro81].

Fig 2.2.2. 1 Generalized cylinders [Bin87]

2.2.2.5 Octrees

Like Quadtrees, which are areas decomposed into groups of four rectangular primitives. Octrees recursively divide volumes into eight smaller cubes known as cells. Sub-division stops if any one of the resulting cells is homogeneous, that is if the cell lies entirely inside or outside the object. If on the other hand, the cell is heterogeneous, that is intersected by one or more of the object's bounding surfaces, the cell is sub-divided fiarther into eight sub-cells. The sub-division process halts when all the leaf cells are homogeneous to some degree of accuracy (Carlbom et al [CCV85]).

lO 11 12

http://graphics.lcs.mit.edu/classes/6.837/F98/TALecture/

Fig 2.2.2. 2 Octrees

One drawback in using Octrees to model 3D objects or scenes is that only an approximation can be made due to the use of blocks. See example below.

Chapter 2 Historical Developments and Related Work- A Synoptic Review 15

http://graphics.lcs.mit.edU/classes/6.837/F98/TALecture/

Fig 2.2.2.3 Octrees block appearance

2.2.2.6 Superquadrics

Superquadrics, see e.g. Solina and Bjelogrlic [SB91] are generalizations of quadrics (Quadratics take the form Ax + By + Cz + Dxy + Eyz + Fzx + Gx + Hy + Jz + K = 0, and are the 3D analogue of conies. Conies are 2 dimensional curves described by the general equation Ax + Bxy + Cy^+ Dx + Ey + F = 0).

The general form of a superquadric is: (Ax)" + (By)"+ (Cz)" = K. By varying the choice of the five parameters (A,B,C,K,n), a wide variety of object surfaces can be modeled. One potential drawback of superquadrics representation is that sharp comers, planar surfaces etc, cannot be represented exactly but will be slightly rounded. Superquadrics have been used by Pentland [Pen87] ,Solina and Bjelogrlic [SB91] and Dickinson t [Dic91] and others.

httD://www.okino.com/slidshow/suDer3wi.htm

Fig 2.2.2. 4 Models produced from Superquads

Chapter 2 Historical Developments and Related Work- A Synoptic Review 16

2.2.2.7 Geons

The modeling schemes above have all been of a quantitative natiire. Biederman [Bie85] proposed a qualitative description called geons (geometrical ions) that allows all objects to be segmented into a maximum of 36 elements. Each element consists of a unique combination of the four featiires edge (straight or curved), symmetry (rotational/reflective, reflective or asymmetric), size variation (constant, expanding or expanding/contracting) and axis (straight or curved).

To achieve recognition, the scheme proposes a hierarchical set of 4 processing layers. Li layers 1 and 2 data is decomposed into edges, component axes, oriented blobs, and vertices. In layer 3, 3D geon primitives, i.e. cones, cylinders and boxes are labelled. In layer 4 the structure is extracted that specifies how the geon components inter-connect; for example, for a human figure, the arm cylinder is attached near the top ofthe torso cylinder, pummel and Biederman [HB92]. Examples of geons and components formed by them are shown below:

(From: Pourang Irani & Colin Ware (2000))

Fig 2.2.2. 5 Example of Geons

2.2.2.8 Multiview Representations

The appearance of any 3D object is view direction dependent and can thus be represented by a number of images taken from different directions. One recognition approach is to construct an aspect graph, comprising nodes that represent a view in which identical features are visible. Graph matching methods can then be used to match features in the image. A drawback of this approach is that complex shaped objects may have very large and complicated aspect graphs. A variation on the above method, especially for complex objects, is the use of a view sphere comprising tesselated faces. Each face in the tesselation is used to represent an aspect, which is then used for matching Flynn and Jain [FJ91b]. Aspects are useful both for matching when parts of an object may be occluded and thus not visible, with applications to matching both to 2D images and to 3D range data.

Chapter 2 Historical Developments and Related Work- A Synoptic Review 17

2.2.2.9 CAD

Considerable developmental efforts have gone into producing powerful CAD modeler packages for the manufacturing and building industries. Many manufactured objects as a consequence aheady have existing CAD produced models. Unfortunately, in general, CAD models are not produced for computer vision recognition tasks and thus obtaining suitable data from tiie CAD system directiy is often quite difficult. From the literature there appears to be two different approaches to using CAD information in computer vision. In the first, the output from tiie CAD system is used to exfract and calculate the information needed, while in the second modification to the CAD system itself is performed to produce the required output information.

The work produced in Flynn and Jain [FJ91b] and Arman and Aggarwal [AA91] are examples of tiie first approach, while that in Bolles et al. [BHH84] and Park and Mitchell [PM88] are examples of the second. A current drawback in using CAD systems for computer vision is the numerous formats that exist. One attempt to standardize has been in the use of the IGES (Initial Graphics Exchange Specification) format, which many CAD modelers can produce. This format was usedbotii in [FJ91b] and [AA91].

A complication in using IGES, is that output can be in several different forms for the same object. For example GEOMOD, the modeler used in [FJ91b] can produce IGES outputs both in "analytic form" and in NURBS (Nonuniform Rational BSplines). Object features such as line segments, circular arcs, parametric spline curves, composite curves, planes and surfaces of revolution are often more easily matched using the "analytic form" output. In more recently image understanding packages such as [VXLOl], and TargetJunior [TJ] have been created which incorporate CAD like formats.

2.2.3 3D Object Recognition

The number of recognition techniques developed and described in the literature is quite vast, as already alluded to above. The following is therefore by necessity limited to approaches that are most relevant to the applications in this work. For a robotic vision system involving remote object retrieval, recognition must be accompanied by pose determination. To facilitate recognition, knowledge of how the objects may appear, plus images of the scene possibly containing those objects will be required. Knowledge of objects appearance is most often provided by way of "object models". Iirespective of the recognition approach used, it must be able to cope with extraneous, spurious, missing and inaccurate data, caused perhaps by poor segmentation, occlusion, image discretization, and inaccuracies in camera modeling. By and large the process of recognition involves "hypothesis generation and verification testing"

Chapter 2 Historical Developments and Related Work- A Synoptic Review 18

2.2 J.l Hypothesis Generation and Verification Testing

The generation of initial hypothesis involves identifying which features on a certain model, matches that extiacted scene features, or altematively, to which object one or several extracted scene features belong to. If the model database is large, an exhaustive search approach is prohibitive, consfrained searches are therefore required. Features having invariant properties under perspective transformations can be used to constrain the search process. Nielsen [Nie85] used a certain ratio of the areas of triangles that were unique for each object. Colors have also been widely exploited (Koschan [ Kos93b]).

2.2.3.2 Use of Invariants

Features that are invariant under transformations, such as rigid motions, perspective projection, and scaling have been widely used in recognition systems for some time. Systems utilizing both 3D object-models and 3D scene data can exploh invariants such as lengths, angles, surface areas, and volumes to aid the matching process. Invariant features or nearly invariant features were used in the ACRONYM system [Bro81] and also by Lowe [Low87] who used groupings of lines based on colinearity, parallelism and proximity. Projective invariants have also attracted interest particularly in work utilizing uncalibrated cameras. Rothwell et al. [RZFM91] use 4 invariants, 2 calculated from 5 coplanar lines, another from a conic and 2 lines lying in the same plane, and a final invariant obtained from 2 conies. An elegance that stems from this work is that models can be acquired from a single view. For recognition, hypotheses as to which object is in the scene are formed by extracting lines and conies from the image, and these are used to establish groups of invariants in an indexed hash table. In circumstances where groups of lines and conies correspond to the same object, these are merged to form final hypotheses. A verification step is then used to narrow down the list to the most plausible object.

2.2.3.3 Geometric Hashing

Geometric hashing was first introduced by Lamdan and Wolfson [LW88] and has similarities with the Hough Transform. Grimson [Grim90a] claims that the method performs well with small scale recognition problems (i.e., models with small number of features and scenes with limited occlusion and clutter), but it tends to suffer from false positives when the problem size is increased. Fundamental to geometric hashing as with most recognition methods, is the assumption that the object can be represented by a number of features. Initially, it is required that several extracted features be used to define object based local coordinate systems, invariant to the fransformations applied to the object. To explain further, consider as an example the image a 2D object. A two dimensional coordinate system is established as follows: Two convenient points on the object are chosen to define the x-axis, and the distance between these points is set to 1. The y-axis is taken as a perpendicular line through the centre of the two original basis points chosen. The remaining points on the object are now referenced to this coordinate frame. Essentially this procedure is repeated for each pair of points, i.e. all remaining point pairs on the object are chosen as basis points, and for each such pair the coordinates of the points on the object in each coordinate system used to build a hash table. This procedure is a preprocessing step and need be carried out only once. For the recognition phase, pairs of points in the image are chosen to define local coordinate systems and if a pair is chosen correctly, the coordinates of

Chapter 2 Historical Developments and Related Work- A Synoptic Review 19

the points will, when hashed, point to an entry in the hash table. These points are then used to vote for a certain model. A good match is evident when, a lot of points in the object vote for the same object. A detailed example may be found in Dijck et al [DKH98]

2.2.3.4 Constrained Search

Assuming a database of objects represented by features, in theory it would be possible to try to match every scene extracted feature to every model feature in every possible combination. The number of match combinations can be expressed as an interpretation tree, but in general, if more than a few features are to be matched, the resulting number of combinations will be prohibitively large. The constrained search approach is to prune the interpretation tree significantly and therefore the resulting combinations, by using relations among the matched features, or pairs of matched features.

Typical constraints used : • The length (area) of a data fragment must be smaller or equal to the length (area) of a

corresponding model fragment, up to some bounded measurement error, • The angle between the normals to a pair of data fragments must differ from the angle

between the normals of the corresponding model fragments by no more that a bounded measurement error.

• The range of distances between two data fragments must lie within the range of distances of the corresponding model fragments, where the model range has been expanded to account for measurement errors.

Grimson [Gri90b] has shown that if all data features are known to belong to the same object, the degree of complexity of the constrained search is quadratic, whereas for the case that some features do not belong to the model, the complexity may increase to exponential. Pollard et al [PPPMFSS] constrained the matching process searches by exploiting several pair-wise relationships between 3D line segments obtained from stereo. While noting that from any pair of matches, all six pose parameters could be calculated by restricting matching to only several parameter related quantities, considerable pruning ofthe search time could be obtained.

2.2.3.5 Aspect graphs

Traditional features used in aspect graphs are lines and curves or surfaces. A cylinder for example has three characteristic views or aspects. Fig 2.2.3,1,

http://www.dai.ed.ac.uk/CVonline/LOCAL_COPIES/OWENS/LECT13/node4.html

Fig 2.2.3. 1 Three characteristic views of a cylinder (Aspects)

Chapter 2 Historical Developments and Related Work- A Synoptic Review 20

These characteristic views divide 3-space into a finite class of volumes, and each volume represents a different aspect ofthe object. In essence these views define the aspect graph. The aspect graph also comprises of nodes and arcs. Nodes are the volumes in space while the associated arcs correspond to whether volumes are neighbours or not. The aspect graph of a cylinder is given in figure Fig 2.2.3.2.

/ ^

http://www.dai.ed.ac.uk/CVonline/LOCAL_COPIES/OWENS/LECT13/node4.html

Fig 2.2.3. 2 Aspect graph of a cylinder

Moving from one aspect to another is called an event. This corresponds to moving along an arc in the aspect graph. A drawback of the method of aspect graphs is that it is unwieldy for complicated objects. For example, for an object such for Michaelangelo's "David" it has been estimated that there are approximately 10'* different aspects (Owens [Owe97]).

Various attempts have been made to reduce the search space (Eggert,et.al [EBDCG92]). An interesting approach has been used by Lowe [Low91] in which a simple hidden line algorithm eliminates most lines that can not be seen from certain viewing directions. For such cases aspects are generated on line as opposed to pre-computation, which is the more common method and quite time consuming.

2.2.4 Pose Determination

Determining the pose of an object for recognition based vision systems is important for a number of reasons. If physical retrieval ofthe object by a robot arm is desired then pose information is essential to direct grasping of the object. If pose can be calculated during the matching phase (say from 2, 3D line matches), this can be used to constrain the search process considerably.

Methods used to calculate pose vary depending upon factors such as: type of recognition system (i.e., matching 2D objects from 2D data, 3D objects from 2D data or 3D objects from 3D data), the features used (i.e., points, lines, surfaces etc), and the imaging (i.e., range images, projective, orthographic etc). For some recognition systems pose can be determined during the matching

Chapter 2 Historical Developments and Related Work-A Synoptic Review 21

Stage. One example of this is mentioned above, another is the case of aspect graph matching, since each unique aspect corresponds to a unique view direction, position and scale is apparent. More commonly however pose is calculated either analytically or iteratively from a number of matched features. Solutions to pose determination have been given by authors Huttenlocher and Ullman [HU88], Dhome et al. [DRLR88], Lowe [Low85], [Low91], Faugeras and M. Hebert [FH86]. A number of these are be briefly discussed below.

For recognition systems using the matching of 2D feattires to 3D models, Huttenlocher and Ullman [HU88] demonsfrated that for 3 point to point matches, under a weak perspective projection (orthographic projection + scale) a unique analytic solution exists. Dhome et al [DRLR88] also presented a method using 3 line correspondences under perspective projection. The solution, while analytical, requires that the roots of an 8th degree polynomial be determined. An ahemative iterative solution, using more than 3 lines and based on Newton's method suggested by Lowe [Low85] is also given. Lowe [Low91] developed another iterative solution using a Levenberg-Marquart technique and a stabilizing function, making it possible to solve under constrained conditions. The method is able to handle articulated objects as well as solving for object parameters, ie. size. In the case of recognition systems using 3D feature to 3D model matching, if line-features are used, a unique solution is possible from 2 matches, while for point features, 3 matches are required.

2.2.5 Verification of Object Hypotheses

To complete the recognition process, once a hypothesis of a matched object has been established it is necessary is to verify that the hypothesis is correct. Frequently the pose of the object will make it possible to predict the location of uiunatched model features in the image(s). TTiese can then be searched for and if matched verification accepted. Fisher [Fis89] argues that all features (surfaces) in a model should be accounted for, that is, either they should be found in the image, or predictably occluded behind some object. Other methods use an empirically determined threshold on the fraction of model features that must be matched to data features.

2.2.6 Recognition Strategies

Prior to implementing a recognition system, decisions need to be made relating to strategies in applying primitives and their use as features, as well as the order of matching. Often these choices are govemed by the specifics ofthe applications involved. Intuition, refined by trial-and-error, also play a role in the decision making process. In more recent years, systems that automatically produce recognition stiategies for a given model, have been considered e.g. the classic works by Goad [Goa83] and Hansen and Henderson [HH89] in which automatic recognition stiategies were produced from CAD models of objects. A variation by Dceuchi and Kanade [IK88] demonsfrated that a recognition strategy could be achieved from an object model and, importantiy, a good model ofthe sensor. Draper [Dra93] in his "Schema Leaming System" (SLS), incorporates leaming knowledge-directed recognition stiategies from tiaining images (with solutions) and a library of generic visual procedures. In order to represent stiategies, recognition is modeled in SLS as a sequence of small verification tasks interleaved with representational transformations. At each level of representation, features of a representational instance, called a hypothesis, are measured in order to verify or reject the hypothesis. Hypotheses

Chapter 2 Historical Developments and Related Work- A Synoptic Review 22

tiiat are verified are then transformed to a more abstiact level of representation, where feattu-es of tiie new representation are measiu:ed and the process repeats itself. The recognition graphs leamed by SLS are executable recognition graphs capable of recognizing the 3D locations and orientations of objects in scenes.

2.2.7 Choice of Recognition System for MROLR

From the preceding review of the literature, and from a knowledge that target objects to be located and retrieved by MROLR are man-made, a conclusion reached is that matching techniques utilising features such as lines, curved segments, and/or comers, in a constrained search, is the most appropriate for this work. The system based on the work of Pollard et al [PPPMF88] was chosen to form the foundation buildmg blocks for the recognition module of this research. More specific details of the benefits of using 3D edge-based models are enumerated in Section 6.1.

2.3 Robots with Metric Maps

In recent years considerable advances have occurred in the area of automated mobile vehicles. The ultimate objective of much of the research is to build mobile robots capable of autonomous navigation without any human intervention. Duckett [DucOO, page 37] points out: Most roboticists are more mterested in building robots that work than in producing realistic cognitive models with knowledge acquisition capabilities. It is common practice in mobile robotics to incorporate in the design the necessary behaviours, feature detectors, etc., required for a particular application. If the robot is then transferred to a new environment the preinstalled conq)etencies may fail. By contrast a leaming robot need not be given all of the details of its environment by the system designer, and its sensors and actuators need not be finely tuned (Dorigo & Columbetti [DC98]).

The robots described below (in some instances summaries of Duckett [DucOO] Surveys), were selected for their particular relevance to the problem in hand of concurrent map building and self-localisation. Each employ models of the environment in which an explicit Cartesian reference frame is used for referencing, and mapping is either feature-based or grid-based. No attempt is made to separate the systems on the grounds of the sensor types used for perception. Ideally any navigational system should be able to accommodate and integrate new sensor types if tiiey prove to be of benefit.

2.3.1 Feature-Based Maps

2.3.1.1 AGVs

Leonard & Durrant-Whyte [LD92] using various Robosoft automatic guided vehicles (AGVs) were able to attain precise metric maps from sonar sensors. The maps were built from a set of pre-defined geometric features including planes, cylinders, comers and edges. These features were made up of primitives known as "regions of constant depth" (RCDs), and consisted of groups of sonar retums of similar range. While the robot was in motion, self-localisation was

Chapter 2 Historical Developments and Related Work- A Synoptic Review 23

achieved by tracking stationary features in the environment and applying an extended Kalman filter to combine the inferred position estimates. An important contribution of this work was the development of an improved sonar sensor model for robot navigation.

A multiple hypothesis tracking technique to deal with the problems of map building in dynamic environments was proposed by Cox & Leonard [CL94]. Specifically the approach endeavoured to account for different possible interpretations ofthe robot's sensor data by maintaining multiple environment models at each time step. Models were associated with a probability, reflecting likelihood of it being the "correct" model. Testing was verified in selected environments.

A map building mobile robot named ARNE equipped with a single rotating sonar sensor, was implemented by David Lee [Lee95]. Navigation utilised a feature-based metric map and self-localisation obtained via an extended Kalman filter. An additional grid-based map was used for the purpose of path planning, and this was derived from the feature-based map. Evaluations of various exploration strategies, used by the robot to build its maps were performed (Lee & Recce [LR97]). The best results were achieved by a hybrid model-based reactive exploration strategy, consisting of wall following with some map-based interventions according to a predefined set of heuristics. Testing was carried out in a static, laboratory environment.

Davison [Dav98] implemented a real-time feature-tracking autonomously navigating mobile robot system capable of exploration in unknown environments, using the OXFORD GTI mobile robot platform and Yorick active binocular vision system. Map building was carried out using information from measurements of arbitrary features and robot odometry. Map features were updated (i.e., added or deleted), in an automated systematic maintenance procedure. An emphasis of the work was in producing maps useful over long periods of time, permitting features to be re-detected/re-measured, in areas previously visited, even if the original tiajectory was not followed, and in addition to be able to recover accurate position estimation after periods of drift. A Kalman Filter-based method utilizing full-covariance was used.

Testing in a real environment was carried out with ground truth comparisons performed using an accurate grid. Greater details of Davison's algorithms and approach are given in Chapter 6 of this thesis.

Fig 2.3.1.1 Oxford's GTI vehicle (from [Dav98])

Chapter 2 Historical Developments and Related Work- A Synoptic Review 24

2.3.2 Grid-Based Maps

Grip-based maps utilise Cartesian occupancy grids, where each cell contains a measure of certainty that any object occupies the corresponding location in the robot's environment. The probabilities are obtained from pre-defined sensor models that project the robot's range-finder readings onto the corresponding grid cells. A Bayesian update rule is used to combine multiple readings over the same cell when necessary. Self-localisation is achieved through correlation of a grid constructed from the current sensor readmgs with a stored map, and then finding the displacement and rotation which produces the best match between the two grids. Early successfiilly implementations using grid-maps were carried out by Moravec and Elfes [ME85], [ Elf 87].

An advantage of the grip-base map approach is that it avoids the correspondence problem since there is no need to identify the source of the robot's sensor retums. In addition, occupancy grids provide a natural representation for combining different sensor modalities, provided that a good model is available for each different type of sensor. For example, Thnm et al [TFB98a] used an occupancy grid to fuse range information from stereo vision and sonar. The disadvantages ofthe approach are that it takes up a large amount of memory, requires precise position information for map building and depends on accurate range finder sensing. For example, the specular effects associated with sonar sensors often result in geometric errors in the map.

2.3.2.1 ARIEL

Yamauchi et al [YSA98] developed ARIEL an autonomous mobile robot built on a Nomad 200 platform. ARIEL was equipped with a planar laser range finder, sonar and infrared sensors. During exploration of an unknown environment it built its own map using an integrated map building sfrategy known as frontier-based exploration (Yamauchi [Yam97]) which included a continuous localisation technique for correcting the robot's odometry (Schultz & Adams [SA98]). Regions between open and tmexplored space in a global grid model knovm as "frontiers" were detected using a process analogous to edge detection and region exfraction in computer vision. An interesting feature of the system was a sensor scanning technique in which specular reflections affecting the robot sonar sensors were corrected by the laser range finder.

The exploration sfrategy encouraged the robot to navigate to the nearest frontier. If this was reached, a new sensor scan was performed and the map updated, with any new frontiers detected, added to the list of tmexplored locations. A short-term local occupancy grid, constructed from the robot's recent perceptions, was matched to the long-term global map to perform self-localisation. Using this matching process the possible ttanslational and rotational errors were restricted to a small space in the global grid which cenfred roimd the current position estimate, produced by dead reckoning. Duckett [DucOO] points out that this dependence on prior position knowledge for self-localisation means that the system would be imable to recover from becoming lost.

Chapter 2 Historical Developments and Related Work- A Synoptic Review 25

http://wwfw.aic.nrl.navy.miL/~schultz/research/ariel/ Fig 2.3.2.1 ARIEL

2.3.2.2 RHINO

RHINO (also earlier referred in Sect 2.1.6) uses both vision and sonar sensors to build accurate metric maps. Occupancy probabilities for the grid-based map cells are obtained from a neural network fed from sonar readings which are trained to convert proximity readings into circular, metric maps. Increasing geometric accuracy of the maps is obtained by combining information from neighbouring sensors to reduce specular effects. Using vision sensors in addition to sonar sensors permits depth information of objects to be recovered from camera image edges, which may not be detectable using sonar alone.

RHINO also produces a topological map that is used for path plaiming. This map is derived from the grid-based map by using critical points detected in a Voronoi diagram to partition the unoccupied space in the grid into a set of discrete locations (Thrun [Thr98b]). The benefit of using both metric and topological maps is that it allows the robot to exploit the "orthogonal strengths" of each. Localisation is estimated from a position probability density function by repeatedly matching the sensor readings with a model ofthe environment.

Fig 2.3.2. 2 RHINO (From Univ. Bonn, Germany)

Chapter 2 Historical Developments and Related Work- A Synoptic Review 26

2.3.3 Robots with Topological Maps

Topolopogical Maps utilise a graph of connected places to represent the environment; self-localisation becomes the task of place recognition (Kortenkamp & Weymouth [KW94]). Pioneering work on navigation using topological maps carried out by Kuipers & Byun [KB91]. Systems incorporating topological maps have the advantage that the robot does not need to know its exact position for map building. Lee [Lee96] implemented Kuipers' "Spatial Semantic Hierarchy" on a real robot, but the system was only tested in a small-scale laboratory environment consisting of three cardboard corridors.

2.3.3.1 TOTO

TOTO, a wall-following mobile robot developed by Mataric [Mat91], used a topological map and explored its environment. Landmarks were classified according to a designer-determined set of categories and identified using sonar sensors and an electronic compass. The topological map comprised of nodes representing each distinctive landmark which were linked to adjacent landmarks on route during the exploration phase. While navigating, a path to a goal location was found by spreading activation from the destination node.

www-robotics.usc.edu Fig 2.3.3.1 TOTO

2.3.4 Self-Organising Robots

2.3.4.1 ALDER and CAIRNGORM

ALDER and CAIRNGORM, the name given to two Fischertechnik robots developed by Nehmzow [Neh92], utilise a reactive controller, and wall-following and robot-determined landmarks to identify places. A Kohonen self-organising feature map is used for map building, allowing the robot to construct its own internal representation of the environment by clustering together similar groups of sensor readings. All world models used in this system were acquired autonomously by the robots, and sensor-motor competences were leamed by a simple feedforward neural network. Others (e.g Owen & Nehmzow [ON97], Duckett & Nehmzow [DN98]) validated the efficacy of self-organising feature maps for route-learning, wall following, and relocalisation after becoming lost, utilising a Nomad 200 robot operating in untreated, middle-scale environments. One drawback of this approach is that it is that the robot is restricted

Chapter 2 Historical Developments and Related Work- A Synoptic Review 27

to following fixed paths because location recognition depends on visiting locations in the same sequence as used for map building.

2.3.4.2 ALEF

ALEF, a topological mapping system, developed by Kurz [Kur96] using a modified RWI-B12 robot, utilises classification algorithms such as self-organising feature maps to group together similar sets of sonar readings obtained from sonar sensors. From resulting feature categorisations, the robot's environment is partitioned into contiguous regions known as "situation areas". A graph-based representation of the environment containing topological relations between the situation areas was constructed during map building.

Using the so called A* algorithm (Nilsson [Nil80]), for path planning, the robot was also able to navigate to arbitrary locations in the map. Self-localisation was achieved using the average of odometry dead reckoning, and the coordinates obtained of observed situation areas via a Kalman filter. The approach used has the disadvantage that dead reckoning is soley relied upon when exploring unknown areas.

2.3.4.3 ALICE

Zimmer [Zim95a] developed ALICE, a concurrent map building and self-localisation mobile robot, built deliberately using only low resolution, low reliability sensors, to permit the investigation of poor quality sensor information when navigating in an unknown environment. It was equipped with passive light sensors and touch-sensitive whiskers for location recognhion, and a basic dead reckoning mechanism affected by drift errors of up to 25%. ALICE, in contrast to many other robots, had no separate phases for map leaming and navigation. Instead it was able to continuously adapt its intemal representations through a process of lifelong leaming. An interesting extension of the Growing Neural Gas network (Fritzke [Fri95]), consisting of a set of stored sensor prototypes augmented with Cartesian coordinates and the topological connections between them, was used for map building. Exploration was carried out using a reactive controller, which was subject to top-down influence from various "instincts", such as trying to reach areas of unexplored territory, or trying to improve localisation quality by moving through areas of previously charted territory. One of the most important contributions of this research was the handling ofthe concurrent updates to the robot's environment and location models.

http://transit-port.net/Uwe.Zimmer/Projects/ALICE/ALICE,html Fig 2.3.4.1 ALICE

Chapter 2 Historical Developments and Related Work- A Synoptic Review 28

2.3.5 Hidden Markov Models

2.3.5.1 DERVISH

Nourbakhsh [Nou98] using a customised Nomad 100 robot, implemented a navigation system named DERVISH, in which predefined feature detectors were used to identify landmarks such as doors or junctions. It utilised a high-level self-localisation algorithm known as state set progression. The robot's location model consisted of a "state set", containing a subset of the possible locations in a pre-installed topological map. Features detected in the current sensor data, as determined by the landmarks associated with the locations in the map, were used to initialise the set. Updating of a state consisted of removing it from the set and replacing it with all of the possible subsequent states, derived from the new sensor data and the robot's direction of travel. Each state was assigned a probability value using a predefined "certainty matrix", which represented the likelihood of obtaining the observed features from the actual landmarks in the environment. During the set update, these values were propagated in Bayesian fashion. While Dervish used the most likely state to plan a path to a goal location, it would stop and re-plan if values suggested it was no longer on the correct path. A shortcoming of the approach was the necessity to use a preinstalled topological map, feature models and an uncertainty matrix.

2.3.5.2 XAVIER

XAVIER, a mobile robot system implemented on a RWI-B24 by Koenig «fe Simmons [KS96], [KS98], was specifically designed for performing delivery tasks in an office environment.While the methodology used for feature detection and self-localisation was similar to that used by DERVISH, the environment and location models were obtained from a Partially Observable Markov Decision Process (POMDP). A pre-installed topological map was also required, but the system had some ability to adapt its environment model through on-line leaming, the user-defined map being augmented with distance information from the robot's sensors and actuators. A specially written compiler translated pre-installed models into the POMDP representation. During navigation an extended version of the Baum-Welch algorithm (Rabiner [Rab89]) was applied to improve the compiled distance, sensor and actuator models by adjusting the corresponding probabilities in the POMDP model. The Baum-Welch algorithm is an expectation maximisation (EM) method for acquiring Hidden Markov Models (HMMs) and POMDPs from data. As with the previous systems, the main disadvantage of this approach is its reliance on pre­installed knowledge.

Carnegie Mellon University, Robot Leaming Lab

Fig 2.3.5. 1 XAVIER

Chapter 2 Historical Developments and Related Work- A Synoptic Review 29

2.3.5.2 RAMONA

Shatkay & Kaelbling [SK97], using a modified RWI-B21 robot named RAMONA, applied an extended the Baum-Welch algorithm to perform off-line map building from pre-recorded sensor data. An extended HMM was used to incorporate odometric information, and an expectation maximisation algorithm maintained geometric consistency in the model. A clustering algorithm provided the initial model, which was based on local odometric relations extracted from the recorded sensor data. The final version of the algorithm produced better models from less data and fewer training iterations, and was capable of leaming models for environments containing loops. A drawback in the use of HMM and POMDP models is that the robot's environment model has to be quantized into a set of discrete states.

http://www.cs.brown.edu/research/robotics/robots/ramona.html

Fig 2.3.5. 2 RAMONA

2.3.6 Robots with Hybrid Maps

Integrated systems utilising both topological and metric representations have been investigated by researchers such as Edlinger & Weiss [EW95]; Yamauchi & Beer [YB96], Simhon & Dudek [SD98], Gasos & Saffotti [GS99]. In these systems, a graph of connected regions is used to represent the environment globally, and each region by a small-scale local metric map. Path planning and middle-scale navigation, rely on topological representation, while local metric maps are used for small-scale navigation. The advantage of this approach is that a globally consistent metric map is not required.

2.3.6.1 ELDEN

A navigation system, known as ELDEN was developed for a Nomad 200 robot, by Yamauchi & Beer [YB1996]. A reactive controller was used for exploration that included an arbitration mechanism to combine predefined behaviours such as obstacle avoidance and wandering, a topological map was constructed from odometry calculations, with new places augmented whenever the distance to the nearest stored place exceeded a preset value. Dead reckoning drift errors were corrected by periodically retuming the robot to its starting location and activating a special recalibration routine. The recalibration routine used an occupancy grid, constructed from

Chapter 2 Historical Developments and Related Work- A Synoptic Review 30

the current sonar readings, to compare a previously stored grid, and used a hill climbing method to find the transformation producing the best match between the two grids. Finally this transformation was used to update the robot's odometry. Yamauchi & Langley [YL97] subsequently extended ELDEN by incorporating stored occupancy grids, one for each place node in the map. These were used for recalibration thus removing the need for regular homing.

A novel aspect ofthe system was the use of variable confidence links to represent the uncertainty in the topological relations, and then to use these to adapt the robot's map in dynamic environments. In essence, confidence levels were adjusted using simple rules, to strengthen or weaken them, subsequently enabling the system to use these values to plan altemative routes should obstacles or path blockages occur.

http://w^ww.aic.nrl.navy.mil/~yamauchi/gallery.html

Fig 2.3.6.1 ELDEN

2.3.6.2 MOBOT-IV

MOBOT-IV a mobile robot developed by Edlinger & Weiss [EW95], utilised a dynamically acquired intemal representation of unexplored environments. The navigation system used a 360 degree laser range-finder to form a set of local metric maps which were linked via stored connections to a global topological map. A temporary metric map was first formed using the laser range finder, and then used for self-localisation and to detect areas of unexplored territory. In practice, a predefined threshold based on the distance of the robot to the nearest stored node, was determined whether the current sensor map was to be added to the global map. Cross-correlation of the current sensor map with the nearest stored scan in the global map was then used to obtain self-localisation. Effectively the approach used was the same as that of Hinkel & Knieriemen [HK88], in that angle histograms were first constructed and convolved to find the most likely rotation of the robot then x and y histograms were matched to determine the most likely translation. A region of open space detected in the current sensor scan with a width greater than that of the robot, was defined as a "passage", ultimately detected passages were added to a stack of goal destinations. Path planning was carried out on the topological map using the A* algorithm (Nilsson [Nil80]). The system was tested in a static indoor corridor that demonstrated

Chapter 2 Historical Developments and Related Work-A Synoptic Review 31

both its reliability and scalability. A weakness ofthe system is its reliance upon prior positional information for self-localisation.

__ .yi

httn://ae-vn-www.informatik.uni-kl.de/Proiekte/MOBOT-IV/MOBOT-IV.html Fig 2.3.6. 2 MOBOT-IV

2.3.6.3 TheDuckett Mobile Robot

Duckett [DucOO], utilising a hybrid deliberative-reactive control architecture developed a mobile robot system in which all of the environment and location models, feature models, and sensor-motor competences required for navigation are acquired independently by the robot. A significant distinction of Duckett's system is the claim that the robot could recover its position even after becoming lost. Leaming techniques include self-organisation, where no teaching signal is required, and self-supervised leaming, where all of the training examples are generated by the robot. The system is able to build its own maps and navigate in many different, indoor environments. During an exploration phase, the robot builds a graph-like representation of its environment, in which each location is identified by a description of the robot's sensory information known as a landmark. To determine an appropriate landmark recognition mechanism, an experimental procedure was developed which permitted different algorithms to be compared under identical experimental conditions. With this method existing approaches to landmark recognition were evaluated, and a new self-localisation system was developed. To overcome problems such as perceptual aliasing (the fact that landmarks may not be unique to individual places) a self-localisation algorithm was developed which accumulates sensory evidence over time so that the robot can recover its position even after becoming lost. The work was validated by testing in untreated environments, of several hundred metre square.

Tu r r e t

Ba.so

16 sonac s ensors

16 i n f r a c e d sensocTS

The Noon jd 200 mofafle robot FortjfTvio.

Fig 2.3.6. 3 Duckett mobile robot [DucOO]

Chapter 2 Historical Developments and Related Work- A Synoptic Review 32

2.4 Mobile Robots using Vision Based Sensors For Navigation

Desouza and Kak [DK02], and Zhang [Zha02a] carried out comprehensive surveys on "Vision for Mobile Robot Navigation". It is evident from these surveys as well as from the robots afready discussed in section 2.2 tiiat in the past 20 years very significant developments and progress in vision-based mobile robots for both indoor and outdoor appUcations has occurred. To a large part these improvements were facilitated by the very substantial improvement in PC speed, and video handling. Desouza and Kak point out that it is now possible to design a vision based navigation system that utilises topological representation of space, and an ensemble of neural networks to guide a robot through interior space. The topological representations helpmg the robot figure out which neural networks to use, and in which part of the space. Examples of mdoor robots are NEURO-NAV [MK93a], [MK93b] and FUZZY- NAV [PPKK95] and FINALE [KK92]. These robots can navigate at an average speed of 17 m/min using an ordmary PC-based architecture (Pentium II 450Mhz, with no special signal processing hardware).

Desouza and Kak also cite the equally impressive progress that has been achieved in computer vision for outdoor robotics. Representative examples are the NAVLAB system (Thorpe et al [TKS87], [THK88], [Pom89], Tseng et al [TC92], Pomerieau [Pom94], [PJ96]), tiie work on vision-guided road-following for "Autobahns" (Dickmanns et al [DZ86], [DZ87], [DM92], [Dic99]), and the Prometheus system (Graefe et al [Gra89], [Gra92a], [Gra92b], [Gra93], [RG94]).

Then fmdings suggest the following broad base categorisations:

(i) Map-based Navigation. These are systems that depend on user-created geometric models or topological maps ofthe environment.

(ii) Map building based Navigation. These are systems that use sensors to construct thefr own geometric or topological models ofthe environment and then use these models for navigation.

(iii) Map-less Navigation. These are systems that use no explicit representation at all about the space in which navigation is to take place, but rather resort to recognising objects found in the environment or to tracking those objects by generating motions based on visual observations.

They conclude their survey with a somewhat unexpected finding: "... if the goal is to carry out function-driven navigation ... an example being to fetch a fire-extinguisher that is somewhere in a given hallway or to stop at a stop sign under varying illumination and background conditions -we are still eons away".

To a large extent this last observation puts into context the difficulty of the work undertaken by this thesis.

Chapter 2 Historical Developments and Related Work- A Synoptic Review 33

2.5 Conclusion

This chapter has highlighted the rapidly growing applications of robots for the non-mdustrial and manufacturing sectors these are the so-called "service robots". Brief details of some of these service robots and applications were given.

A synoptic literature review of 3D recognition systems and modeling methods, and robots using autonomous navigation methods was also presented. It is from this review that 3D wire-frame models were decided upon as being the most suitable for the recognition module of this work. The literature survey concludes with a summary of a recent survey on mobile robots using vision-based sensors for navigation. A noteworthy conclusion reached in this survey is that the "machine vision fraternity" is still a long way off from being able to reliably use mobile robots to find and retrieve objects in "unmodified" settings. That may well be true, but then humans also need artificial aids such as good lighting, and a range of temperatures, to work well in.

In the next Chapter, specifics of the hardware and software requfred to implement MROLR are discussed.

Chapter 3 System Equipment 34

Chapter 3 System Equipment

3.0 Introduction

To meet the objectives of this thesis in building a robotic system capable of navigating with the aid of active vision, visually locating and physically retrieving real-world objects, requires considerable hardware and software resources, a substantial proportion of which is not readily obtainable "off-the-shelf. Unobtainable hardware was therefore specifically designed and built, and software coded.

In this chapter brief details ofthe hardware and software modules developed and utilised for this work are described.

3.1 Hardware and rela ted software

The major hardware components of MROLR comprise:

(a) A TRC* Labmate mobile robot base powered via 2xl2V fork lift truck batteries.

(b) A BiSight 4 DOF pan-tilt-vergence stereo head, grayscale CCD cameras fitted with 13mm fixed focal lenses cameras (the head was also purchased from TRC).

(c) A Win32 Imascan framegrabber residing in a win95 platform PC on board the mobile base,

(d) Two computerised revolvmg horizontal tables (designed by the author and assistant technical staff-see appendix B), These tables facilitated the creation of object-models,

(e) Camera Calibration Tile (Fig 3,1,2).

(f) UMI RTX, 6 DOF robot arm, contioUed via a PC (docking robot), mounted on either a caster platform or mobile transport robot (refer to (g)).

(g) Mobile Transport Robot: this hardware is in a state of partial completion (designed by the author and assistant technical staff -see appendix B),

(h) Several PC's interconnected via a local area network.

Figures 3.1.1 (a) and 3.1,1 (b) are sketches ofthe mobile base and docking arm and show system local network networked data flows. Numerous photo images of the hardware are provided in Chapter 7.

Transition Research Corporation, USA.

Chapter 3 Svstem Koninment 35

s t e r e o Inages and Mobile Robot S t a t u s In fo

Mobile Robot Cowhands

Local Network & RS232 Cables

Head Connands

BIslght Pan Tilt Head and Caneras

IFrane .Grabber!

visual Inpift

PC

TRC Mobile Robot [Robot "1 ^ . • « . I Contro l ler .

fSSu

(a)

Bat ter ies

Robot Arm Docked with Mobile Base

:

BIslght Pan Tilt Head and Cameras

F r a n F l .Grabber

Visual Inpu t

PC

TRC MoWle Robot [Robot ~| , - - > . .Cont ro l le r .

Bat te r ies

fflB

(b) (a) TRC Mobile Robot Base, (b) with Docked Robot Arm

Fig 3.1.1 Sketch of MROLR

Fig 3.1. 2 Calibration tile

Chapter 3 System Equipment 36

PC Load Distribution

The computational load is distiibuted using a number of PC's interconnected via a local TCP/IP network. This additionally accommodates platform specific requirements eliminating the need for the rewriting of third-party proprietary supplied equipment drivers.

Further details are given below:

(1.) Stereo head PC resides on the Mobile Base (platform: Win95/Dos ) . Main functions: Execution of head movements commands, Captiu"e, and fransmission of stereo scene-images,

(2,) TRC Mobile Base PC (platform: Win3.11/Dos). Main fimctions: Execution of commands relating to mobile base movements. Monitoring and displaying of odometry positional information, Confrol of motorised model-creating tables.

(3.) Principal Navigation and Recognition PC {platform: RedHat 7,0 Linux). Main functions: Processing of stereo image pairs for purposes of navigation, localisation, object recognition and object pose determination. Coordination of data sharing across the local-network.

(4.) Robot Arm PC (platform: Win98 ; Main function: Accept object pose information via dynamic data exchange (DDE). DDE provides a convenient mode of data interchange for Win32 applications. Calculation of the necessary inverse kinematic solution to facilitate the grasping and retrieval of recognised object.

(5.) Transport Mobile Base PC {platforms: Motorola HCl 1 and Win98). Main function: Respond to request to dock with the TRC mobile base, accept object pose information „. as for (4).

Essential software links and patches were written to facilitate the communication, integration, and co-operation of each of the various platforms and items of equipment. This included image format converters and DDE programs running in the background.

Processing time was not greatly considered as the PC's used were rather old (60-90 MHz, Pentiums with 32 MB RAM). At the time of writmg, PC clock speeds over ten times faster were available and consequently would produce a corresponding improvement in performance if obtained and utilised.

Chapter 3 System Equipment 37

3.2 Navigation and Recognition Software

The Navigation and Recognition Software has been largely developed around two software resources, namely "TINA" and "SCENE".

TINA is a machine vision research envfronment, running under Unix (X-windows). It is written in C and includes a substantial collection of integrated image processing functions. The development of TINA commenced at the University of Sheffield and more recentiy continued at the University of Manchester. TINA has been written to provide a research environment for machine vision programmers. It utilises a mouse driven front end, and an integrated package of vision library routines, running under the UNIX operating system with Xwindows graphics support. The individual modules and display facilities are provided as tools under the control of a parent that also displays diagnostic messages. These sub-tools can be grouped in to two types, graphical infrastructure and research.

The following sub-tools are used in this thesis: Stereo Tool Edge Tool Matcher Tool Calibration Tool

The Matcher tool was modified and extended. In addition a 3D Model editor tool was added. Further details are given in Chapter 4

SCENE is a flexible open-source C++ library for sequential localisation and map-building (SLAM), It is suited to 2D/3D mobile robot navigation, this being accomplished by the estimation of the states of the moving robot and numerous stationary features. In general measurements are made via one or more sensors mounted on the robot, A feature of SCENE is its modular modeling system, which is used to represent the specifics of a particular system. New model classes can be "plugged in" to the main system with relative ease. Model classes suitable for MROLR were implemented and incorporated into SCENE. In addition a vision based obstacle avoidance function was added. Details are provided in subsequent chapters.

Both TINA and SCENE are significant contributions to the Scientific Community and generously made freely available at the web sites indicated in the Bibliography at the end of this thesis.

3.3 Conclusion

A complex system such as MROLR requires considerable hardware and software resources. While a policy of using available resources where obtainable was adopted, tailor designed specific hardware was required to be designed and built, and software written. This chapter outlined the resources required and utilised. Subsequent chapters elaborate on how these resources are used or developed. Chapter 4 details the mathematical notation and models used throughout this work.

Chapter 4 Background on Models and Notation 38

Chapter 4 Background on Models and Notation

4.0 Introduction

In this chapter, mathematical notation, concepts, and a variety of preliminary models used and extended in subsequent chapters are outlined and developed.

The chapter is effectively divided into three sections. The first introduces the mathematical notation and conventions adopted throughout this thesis. The second considers modeling of: • cameras and the Bisight stereo head • the mobile robot base

The third section considers algorithms and tools used for: • stereo camera calibration • object and scene-modeling, • object recognition

4.1 Mathematical notation and conventions adopted

4.L1 Vectors, Matrices, and Coordinate Frames

Vectors and matrices will in general be distinguished by boldface type. Examples:

vector P = Pxi +Pyj +Pzk represents a 3D point in space having coordinates Px, Py, Pz respectively.

Coordinate frames are designated by capital letters. Where several frames are related, an additional index integer, or subscript is used (i.e., Co, Cl). Vector point P in frame Co is defined by the 1 X 3 matrix given below. (Note CO and Co designate the same frame).

Co

>Co _

(pCo^

p 0

Fig 4.LL 1 Vector point P in frame CQ

Chapter 4 Background on Models and Notation 39

4.1.2 Partial Derivatives

The V (del) operator is used for partial differentiation. V(y)x is used to represent —. If eitiier >; dx

or X is non-scalar, bold typeface is used. For example the following are equivalent representations:

5f, V _

au

dz(t + At) dz(t + At)

dv,

dv,

5v, I ^ " 2

dx(t + At) dx(t + At) 5v,

d(p(t + At) d(p(t + At) dv. dv.

= V(f,)„ =

Vz(t + At)^^ Vz(t + At)^^

Vx(t + At)^^ Wx(t + At)^^

y(p(t + At)^^ V(p(t + At)^^

4.2 Camera, Bisight Head, and Mobile-Base Models

In this section notation and preliminary models used for cameras, 4 DOF Bisight head, and mobile-robot base, will be infroduced. The modeling is based on the work of Davison [Dav98] which was designed and unplemented for Oxford's (Yorick) stereo head, and GTI mobile-robot. Davison has generously made his work publicly available and in light of possible continuing collaboration (see Wmgate & Davison [WD02a], [WD02b]) and benefits of standardisation, the following (and later derivations) relating to MROLR's architecture, have wherever possible used notation and equations consistent with those of Davison.

MROLR's stereo-head appears somewhat different to Oxford's Yorick 8-11 (Fig 4.1.2.1), but geometrically the differences are relatively minor, both having 4 degrees of axial movement. Consequently both can be modeled in a similar way.

The TRC mobile-base used by MROLR differs significantly from that of Oxford's GTI mobile-vehicle. The GTI base is 3 wheeled, with the motor driven rear wheel being responsible for steering. In comparison, the TRC (Fig 4,2,6,2) utilises 2 centie positioned motor driven wheels and 4 comer stabilising casters. An advantage ofthe TRC's configuration is that rotations about its central axis are possible, in some instances simplifying the "shortest route navigation contiol" to a rotation followed by a tianslation followed by a final orientation rotation. In modeling the TRC base, the full range of possible movements will be included.

Chapter 4 Background on Models and Notation 40

(a) Bi-Sight head (b) Oxford's Yorick 8-11 head

Fig 4.L2.1 BiSight and Yorick 8-11 stereo heads

4.2.1 Camera Model

Fig 4.2.1.1 shows the basic geometry of the widely accepted pinhole camera model, used throughout this work. Effectively the pinhole model is an ideal camera model in which the rays from world points to image points pass through the camera's focal point, producing an inverted image on the cameras image plane. For ease of modeling, a virtual image plane may be placed in front of the optic centre (as in Fig 4.2.1.1), resulting in projected upright image. Consider a Scene point P.

World Frame W

Scene Point ,P

Virtual Image Plane

Camera Frame F

Optic Centre

Fig 4.2,1.1 Geometry of pinhole camera model

In world frame W, and camera frame F, this has coordinates

Chapter 4 Background on Models and Notation 41

w

r pw \ X

.W

V

and

r pF\ X

, p''

respectively.

To save case confusion (i.e., P'' not to be confused with p^), m^ will be used in place of P^, and p*" to designate the corresponding position vector of the image point (Xc, yc, f ) as shown m the above camera model. Thus:

r pF\ X

. p'^ .

m = m

y^z J

For the pin hole camera model, with the virtual image plane in front ofthe focal point.

'x^

yc

f

f m

' m, ^

m

v'«z y

-tm^ m

4.1

Light rays impinging on discrete charged couple device (CCD) cells, which make up the camera's image plane, are responsible for the camera's image. This image is mapped to pixels on the computer screen via the computer's frame buffer, and it is from this buffer that the software will normally obtain image information, A point p = (x,y,f)^ in camera coordinates, projects to computer pixel coordinates of:

" = «o-^«^c ^ = ^o-Kyc 4.2

Where horizontal and vertical scaling parameters ku , ky accommodate the mappmg from the camera's CCD image plane to computer screen via the computer's frame buffer, ku and ky represent the "effective" number of horizontal and vertical pixels/mefre respectively, and are obtained via the calibration procedure outlined in 4.3.2.1. The pixel position (Uo,Vo) corresponds to where the optic axis passes through the optic centre of the lens. In the ideal case (Uo,Vo ) will also correspond to the image centre (i.e., for an image size 600 x 400, Uo=300, Vo=200 pixels, respectively, and u = 0,v = 0 corresponds to the top left hand comer ofthe unage).

In matrix form equation 4.2 become:

Chapter 4 Background on Models and Notation 42

'u^

V

.K =

-K 0 0 -k^

0 0

" o ^

vo// 1//

'x,^

yc

[f.

or

f"l V

.K =

_ ;

0

0

^u 0

- ^ v

0

Uo/f

Vo/ /

1// 4.3

And since vector m is proportional to p , it is evident that

^u^

Kh a

- A 0 Mo

0 - A Vo 0 0 1

m 4.4

or

^ M ^

vly

a Cm 4.5, and m^ a C '

^u^

vly

4.6

where

C = - A 0 Mo'

0 - A Vo 0 0 1

4.7 C = 0 - 1

^ 0 - ^ #„ A

A A 0 0 1

4.8

Intrinsic camera parameters f, ku, ky, uo and vo are normally obtained by a calibration routine involving a tile. The calibration procediu-e used in this work is described in section 4.3.2. Matrix C is referred to as the camera calibration matrix, and contains the camera's intrinsic parameters. Equation 4.5 is the well established perspective projection equation relating a 3D

Chapter 4 Background on Models and Notation 43

scene-point, in the camera's reference frame, to the homogeneous vector

the pixel location of its image point.

''u^

v l y

which describes

Thus, knowledge of matrix C permits the 3D location of point m to be determined in the camera's reference frame, up to an unknown scale factor (equation 4.5). That is, it lies somewhere along the light ray that is coincident with vector m. To obtain its actual location along the light ray, using a single camera, requires at least two sequential views of the point, Altematively a pair of cameras may be used in a stereo configuration as formulated in the following section,

4.2.2 Stereo Head Model and Frames

The TRC Bisight, four DOF pan-tilt-vergence stereo head, is mounted dfrectly above the central axis of the base. The head is equipped with encoders that provide accurate information on angular movements, A model of this active head is formulated below to facilitate the calculation of 3D Scene points from corresponding, left and right camera image coordinates, and the joint angles necessary for fixation on known 3D Scene feature points.

Each joint motion of the stereo head is initially monitored in an assigned reference frame and defined by an appropriate homogeneous transform. The motion of the active head is obtained from the product of these transforms. The respective reference frames are shown in Fig 4.2.2.2. The head-centre coordinate frame CO has its origin at the intersection of the pan and the tilt axis. This point remains fixed relative to the mobile base for all head movements. All measurements carried out on the mobile base are referenced with respect to CO, it is the vehicle frame. Its Z-axis lies in the forward direction, with the Y-axis pointing up and the X-axis to the left. This reference frame is also used for the robot arm in obtaining the transforms necessary to guide the gripper for grasping and retrieving of the object. Accurate modeling of the stereo head is thus essential for navigation and object retrieval.

Vectors Mh, PL, PR> CL, CR, UL, UR represent the head's intrinsic parameters, their scalar magnitudes remaining constant. Subscripts L and R refer to the Left and Right Camera frames respectively, PL, CL, HL represent the Left Camera offsets from the origin of frame CO, while PR, CR, DR denote the corresponding offsets for the Right Camera. Vector HIG represents the location of a Scene feature measured in frame CO, Vectors mL, and mR are vectors from the camera's optical centies. When all head angles are zero, both the right and left camera will be pointing sfraight ahead and all head axis will be aligned

Chapter 4 Background on Models and Notation 44

Right Vergence Pan

C -^^

CO , i-'--:]

' CO

Head coordinate frame X ^ , 1 co> Z{^

T:) -3 eft Vergence

Vectors Mh, PL, PR, CL, CR, HL, HR, represent the head's intrinsic parameters, their scalar magnitudes remaining constant. Subscripts L and R refer to the left and right camera frames respectively, mc a 3D scene vector point.

Fig 4.2.2.1 Stereo Head showing Intrinsic Parameters

Right vergence frame

head rotation anele YL, YR = vergence a = pan e = tilt

vergence frame

"-%i

Fig 4.2.2. 2 Stereo head joint motion reference frames

Chapter 4 Background on Models and Notation 45

4.2.2.1 Mobile Base Mh and Head-frame CO

In this frame

Mf = ^0 ^

vO J

C co\

w^CO

m^ = m. CO Gy 4.9

Mh is the vertical scalar distance ofthe head centre from the centre ofthe mobile base.

4.2.2.2 Pan Framed

The pan frame Cl is fixed relative to those parts ofthe head that move as the head pans around. Its y-axis remains vertical and the corresponding x and z-axes remain in the horizontal plane while the x-axis remains parallel to the head's tilt axis. It is horizontally rotated with respect to CO by the pan angle a.

[n this frame:

P I =

7/2^ 0

<p J

. P/? =

(-I/t 0

< p >

4.10

Where I is the horizontal distance between the vergence axes and is the inter-ocular distance between the camera optic centres if both cameras are facing forward. PL = PR = p are the horizontal scalar offset distances between the intersection ofthe pan and tilt axes, and respective vergence axis. The respective p vectors will rotate with a pan movement.

4.2.2.3 Tilt Frame C2

This frame remains fixed relative to parts of the head that move when the tilt angle changes. Its Y-axis remains parallel to the left and right vergence axes and its X-axis is aligned with the tilt axis. It is vertically rotated with respect to Cl by tilt angle e.

In this frame:

^ 0 ^ , C 2

v O y

C 2

^0^

v O y

4.11

CL = CR =C is the scalar offset along either vergence axis between the intersections with the tilt axis and the camera optic axis

Chapter 4 Background on Models and Notation 46

4.2.2.4 Left L and Right R Frames

The left and right camera frames L and R remain fixed relative to the camera's respective optic centres, Thefr Z-axes are aligned with the cameras optic axes and their X and Y-axes lie parallel to thefr image planes. These frames are rotated with respect to C2 by vergence angles Jt and

JR-

In these frames:

n , = 0

v«y

m

^m' ^

m

Lx

L

Ly

L

\^U J

4.12

n„ =

^0^

0

v « y

m

m

m

^m

R Rx

R Ry

R Rz

\

)

4.13

UL = UR = n is the scalar offset along either optic axis between the camera optic centre and the intersection with the vergence axis

In summary, vectors PL, PL, CL, and UL combine to represent the vector offset from the head cenfre to the left camera's optic centte. PR, PR, CR, and UR represent the same offset for the right camera.

The following rotation matrices permit transformation between the various head-frames.

Note a clockwise rotation is +ve (axis of rotation arrow pointing at observer).

lOO J^C.U ^

,LC2 R" =

cos a 0 -sina

0 1 0

sin a 1 cos a

cos yL 0 - sin yi

0 1 0

sin yt 1 cos y 1

4.14

4.16

R " ' =

R " " =

"l

0

0

cos ytt

0

sin yx

0 0

cos e - sin e

sin e cos e

0 - sin yR

1 0

1 cos yR

4.15

4.17

Chapter 4 Background on Models and Notation 47

4.2.3 Calculation of Image Coordinates from a Known 3D Feature Point

With the above framework and reference to Figure 4.2.1.1 it is apparent that:

niG = P L + CL + UL + mL 4.18

mc =pR + CR + UR + mR 4.19

Assume for the moment that m^ the 3D position of a pomt m the vehicle reference frame CO is known and visible in both cameras. From equation 4,18 in the left camera frame L:

m^ = m ^ - p i - m i - c i - n i 4.20

Or in the desired frames:

^L =^ ™G - R P i - R ^L-^L 4.21

Rearranging the transformations into relevant simple components,

mi = R ^ ^ ^ ( R " ' ( R ^ ' ° m ^ ° - p f ' ; - c ^ ^ ; - n i 4.22

Similarly for the right camera: ^R D ^C2/T> C21 /-T* C10,„ CO „ C 1 \ „C2 \ „R . « , m ; j = R ( R ( R m G - P / j ; - C / , ; - n ^ 4.23

Equations 4.22 and 4.23 allow the vectors from the left and right optic cenfres to the scene point in the respective camera coordinate frames to be calculated. Furthermore, equation 4.5 (applied to the left and right camera, see equation 4.24 below) permits the point's image coordinates to be determined. Further details are given in section 4.2.5,

ru ^ ru \ "L

VL

V J

a Cm2

Uj,

Vy?

V' J

a Cm J 4.24

4.2.4 Head Angles to Fixate a Known Point

A frequent task requfred during navigation and map building is to fixate on a known 3D point. This consists of knowing m^° relative to the head-frame, and driving the head so the image of the 3D point passes through uo, VQ the principal point, in both left and right cameras. To simply the task, vergence angles are assigned to be equal and opposite. This form of symmetric fixation is further simplified by tuming the pan axis to view the scene point such that mo in pan frame Cl, ties in vertical Y, Z plane.

hi this state:

m a.' = 0

Furthermore:

Chapter 4 Background on Models and Notation 48

^ C l

i.e.

= R

cos a

0

sin a

CIO

0

1

1

^ CO

m c

-sin a

0

cos a

HO mS

V"GZ)

=

"'GX

<

K°j

4.25

4.26

Solving the top line of this equation for a gives

CO

a = tan 1 ma.

m CO Gz

4.27

To find the remaining frame angles e, YL, YR, use is made of the fact that as the optic axes of both cameras ideally pass through the scene 3D point, therefore m ^ and m " are both non zero in the z coordinate:

mt=mty=ml=ml=0

Writing equation 4.18 in frame L:

r^t- _ t> iC2/-D C21 /TO CIO_, CO „ Cl \ „C2 > „l m^ = R (R (R m^ -PL )-^L )-^L

Solving for e by setting m^^ = 0 yields:

4.28

4.29

e = tan m

Cl Gy

^G! - P - sin

^fm^; )'+(mg'-p)' 4.30

Note the components of m * are determined from knovm a. The remaining vergence angles y

are found by using e to form m "

„ i t > i C 2 , „ C 2

Setting nij^ = 0 , / ^ is solved from:

.C2

4.31

yi = tan -X^Lx

m Cl Lz

4.32

And fmally Y^ obtained from:

yR-ri 4.33

4.2.5 3D Point Location from Image Coordinates UL, VL, UR, VR

Given the left and right image coordmates UL, VL, UR, VR of a scene pomt, its 3D location is determined as follows:

Chapter 4 Background on Models and Notation 49

Choosing the head-frame CO from equation 4.18 ,

m CO .CO , „C0 CO

= Pi +c^ +n^ + m , CO

and rearranging this by fransforming the vectors into their desired frames.

m CO COl^Cl , C02„C2 i C O i ^ Z , = R^"'p^' + R^"^c^^ + R^"^n^ + R^"^mi

then regrouping;

m CO C01/„C1 C12/„C2 C 2 I „ Z , = R^"Ypi' + R^'Yci" + R^'^nJ ;; + R^'^R^'^R^'mJ

Similarly for the right Camera:

m CO . C01/_C1 C12/„C2 C1R„R CQXnCn-nClR^R = R^"YP« + R'-'Yc^ + R ' 'n^ ;; + R^ 'R ' R^ ' m^

4.34

4.35

4.36

4.37

(^0 I R

While these give two expressions for m^ , neither is sufficient as m^ and m^ are calculable only up to an unknown scale factor by inverting equations (4,24) i,e, rewriting equations 4,36 and 4,37 in the form:

m^ aC

^UL^

VL

V y

, m^ a C -1

fuR^

VR

V y

4.38

.CO m = a -i- /Lb CO .CO

4.39

^ C O „C0 , „ j C O

m j =c +//a Where

.CO C 2 I _ i (Pi + R (Ci + R ^LJ)

4.40

4.41

.CO T>C01 „ C 1 . x>C12/„C2 , T>C2/?„/? M

' = R (P/f + R ( c -HR n^)) 4.42

CO c o i n C\2 -n ClLi^ -\ = R ^ " ' R ^ ' ^ R

r,j \

V '• J

4.43

I CO 1^ COl | i C12 | i C 2/? p - 1

U,^

V '• y

4.44

Where X and p are the unknown scale factors. Effectively each camera contains a ray along which the scene point must lie. Ideally the scene point will be at the intersection of these two rays. In practice, because of errors the rays may not intersect and consequently the best estimate

Chapter 4 Background on Models and Notation 50

of the scene point will be the mid-point of the rays closest approach to each other. That is, the mid-point ofthe vector normal to both rays (Figure 4,2,5.1),

r = a + ;ib

Mid point of Normal between rays

r = c-hjUd

Pa

Fig 4.2.5.1 Mid-point of normal between 2 rays

The coordinates of the mid point of the normal between these two camera rays is readily obtainable in terms of vectors a b c and d by calculating X and p for pouits Pi and Pi.. A method for calculating these follows.

Common perpendicular between two rays and location of mid point

given {aP^} = known location on ray 1 {li il = knovm unit direction of ray 1 { c^ } = known location on ray 2 {ii 2} = knovm unit direction of ray 2

Note enclosing {} brackets imply column vectors.

Calculating unit dfrectional vectors:

{u,}=b^Vr«om(^b^°;; 4.45

iCO CO {U2}= d ^ V ^ ^ r m ^ d ^ V ^ 4.46

y = angle between rays sin y = norm( {ui}x{u2}) = norm( [ u 1] {u 2}) {ii 3} = unit direction along common perpendicular {ii3}=[ui]{ii2}/sinv|/ di2 = directed distance (length of common perpendicular) along {ii 3} from ray 1 to ray 2

{a^'} + Mul}+d{U3} = {c^°} +H{U2}

[{u .} { - u j {^3}]

A

12 J

>= ({c-}- {a-})

4.47

4.48

Chapter 4 Background on Models and Notation 51

= [{«i} {-U2} {63}]-' ({c^14"})

or

4.49

find midpoint of common perpendicular using confluence of 2 rays.

{mf } = ( [ i i i ] ' + [ir2]')-' ( [ui] '{a^'} + [u2]Mc^°}) 4.50

Note

[11] = skew-symmetric matrix for {u}

{u}= Ux

•S' . " z .

[u] = 0 - u ,

u, 0

Uy

- u .

-u.. u. 0 ' y - X

i T _ [if] '=-[u]

[ U f = { u } { u r - [ l 3 ] = U x - 1 U^Uy U,U,

X V V y z

^ x ^ z ^ y ^ z z

Chapter 4 Background on Models and Notation 52

4.2.6 Mobile Base Model

The configuration differences between Oxford's GTI mobile-base and MROLR's base mentioned in Section 4.10, amount to a difference in the control variables. The 3 wheeled GTI base is steered via its economical single motor driven rear wheel. Consequently the control inputs are the velocity and rear wheel relative angle. By contrast MROLR's base is controlled by 2 motor driven wheels as shown in Fig 4.2.6.3. Optical encoders provide feedback for control and odometry information. This has the advantage that by setting vi= -V2 the robot is capable of rotating on the spot about its own central axis. In addition, shortest path movements are simple to achieve by an initial on the spot rotation, followed by a linear translation to the desired location and then a final on the spot rotation to end up in the required orientation. The analysis begins by defining the angular velocity of motion co in terms of the vehicles rate of change of angular position (see Figs 4,2.6.1/2).

(O = de dt

4.51

Robot Position (z, x, ^ ) z' ,00

Robot X Coordinate Frame V, A

(/Vorld Reference Coordinate Frame

Robots Initial Position (0.0,0)

V2^

Fig 4.2.6.1 MROLR moved to position (z,x (j))

Assuming inner wheel and outer wheel tangential velocities of vi and V2 metres/sec respectively, the centre vehicle velocity v can be obtained from:

Vl + V2 _ V = = o)R 4.52

Chapter 4 Background on Models and Notation 53

For a given angular velocity © ofthe centre ofthe base, the tangential velocity of each wheel is related via vi=(]i)(R+Di), and V2=CD(R-D]), the radius ofthe arc moved through by the base can be derived as follows:

The arc length moved through in time / seconds is R0, thus

^ J ^ V1 + V2

dt ^ 2 4.53

Solving for R from vi=a)(R+Di)/ V2=co(R-Di) gives:

R^ Dx(^x+^2)

(^1-^2)

4.54

Robot Base at time k+1

Head Centre Coincides with Robot Base Centre

R Robot Base at time k

Posiiion(|<-i-i)

For Comparison: Oxford's GTI Mobile Robot Geometfv

Fig 4.2.6. 2 Oxford's mobile robot geometry

Using equation 4.54 to replace R in equation 4.53 and using a time increment of At, leads to . „ V I - V 2 ^

A0 = At 2Z),

For a constant time interval At the change in ^, will be represented by ^in place of A^

^ V 1 - V 2 ^ I.e., 0 = At

2D, 4.55

Chapter 4 Background on Models and Notation 54

4.56

Consider a circular arc trajectory increase of 0 degrees (Fig 4.2.6.3), the changes in z, x, (j> are readily seen to be : z(t+At) = z + R( sin ((j) (t)+e) - sin(<t) (t)) ) or rearranging

z(t+At) = z(t) + R(cos(|)(t)sine + sin(|)(t)(cose-l))

x(t+At) = X + R(cos (<|)( (t) - cos ((j) (t) +0)) or rearranging gives

x(t+At) = x(t) + R(sin(t)(t)sin0 + cos<|)(t)(l-cos0))

and

(t)(t)(t+At) = (|)(t) + 0

When Vl = V2 = v (i.e., straight line case) 0 limits towards 0 and R limits towards oo, the equations may be rewritten in the form:

z(t+At) = z(t) + vAtcos<t>(t)

4.57

4.58

x(t+At) = x(t) + vAtsin<|)(t)

(t)(t+At) = (t)(t)

4.59

4.60

4.61

z + R(sin ((t»+6) -sin((t())

World Q Q Coordinate

X + Rcos (if) -Rcos ((((+6)

Fig 4.2.6.3 Geometry of a circular arc trajectory

Chapter 4 Background on Models and Notation 55

In general the position ofthe robot base can only be estimated to a degree of uncertainty witii the use ofthe above equations. This is because: (1) Wheel slippage may occur due to uneven terrain. (2) The confroUer will have some calibration error both in terms of distance movement and velocity. (3) Wheel diameter differences will exist due to tyre pressure variations. (4) A collision with an obstacle may have occurred.

Additional positional information to that available from wheel odometry is thus essential for reliable robot localisation estimates. In this work additional positional information is obtained with the use of active stereo vision to acquire landmark visual features, which in tum are used for map-building of the local envfronment, and also self-localisation. The availability of a map is a usefiil tool when navigating in a previously explored envfromnent, and can considerably reduce the risk of becoming lost.

Landmark visual-feature acquisition, map building and navigation are covered in Chapter 5. Of paramoimt importance to this work is the ability to identify and accurately locate a target object. Identification is reliant on having accurate model descriptions ofthe object so that matching can be effected. Two models are used, a 3D object-model created from the integration of multiple views of the object, and a 3D scene-model created from only a single view of the scene (sometimes referred to as a 2,5-D model). In the following tools and algorithms required for building these models are described.

4.3 3D Object and Scene Modeling and Recognition

As outiined in Section 3.2, TINA, Sheffield's Machine Vision Research platform with its comprehensive suite of libraries and tools was adopted as the software environment for developing the model building and recognition modules used by MROLR. Other competing image processing environments such as TargetJr/IUE [TargetJr], Khoros [KHOROS], and Horatio [HORATIO] were also considered as potential platforms, but ultimately were judged not as convenient to use, or in other instances not chosen because of their lateness in arrival on the scene, A view ofthe TINA main interface module window is displayed in Fig 4.3.1 It will be noted that as typical in Windows type program environments, the software is essentially event driven, with the events consisting of mouse button presses. This is very usefiil when studying isolated sections of work or specific options, but not conducive to a stand-alone system, requiring essentially only one "initialising" mouse click. A considerable portion of the effort behind this work was in writing and integrating all of the modules (incorporating three different software platforms) to obtain an autonomous system controlled by input events arising from MROLR's physical environment.

Chapter 4 Background on Models and Notation 56

- •#! tinatool D X

Display NewTvtoolJ View) Terrain) Exit) Help

Macro Rle: defauil ^ append) close) m^

file i/o Mono J Stereo j Sequence)

Tools calib j -Imcak j Edge j Corner)

Steletonj

Fig 4.3.1 TINA's main interface window

Prior to outlining the modules/sub-tools utilised, the relevant algorithmic and programming approaches incorporated in TINA will be briefly reviewed. Some of these descriptions are summaries of work in [Tin97a], [Tin97b], [Tin99]. It should be pointed out that much o f t h e following is now well understood and has been incorporated in numerous machine vision applications for some time. This specifically refers to camera modeling and calibration, epipolar geometry, image rectification, edge-extraction, stereo matching, depth recovery, disparity gradient constraints, 3D model-creation and the like. In light of this, the following descriptions are intentionally brief; their inclusion here is purely for the sake of completeness.

4.3.1 Parallel Camera Geometry For Non-Parallel Stereo

Using parallel camera geometry (i.e., parallel optical axes, and image planes constrained to be in the same plane) can considerably simplify both stereo correspondences matching problems, and the subsequent 3D reconstruction. This simplification arises as the epipolar lines of images are parallel to the baseline, and disparity analysis can now be restricted to a one-dimensional search along these horizontal lines. If calibrated cameras are used, it is possible to construct an equivalent (virtual) parallel camera geometry by a process of epipolar rectification (refer Fig 4.3.1.1). The rectification process can be implemented by projecting the original images onto the new image plane such that pairs of conjugate epipolar lines become collinear and parallel to one of the image axes (usually the horizontal one). The rectified images can be thought of as acquired by a new stereo rig, obtained by rotating the original cameras, (See for examples, [Aya91], [PPPMF88], [FP02]),

Chapter 4 Background on Models and Notation 57

^ 1 r ^ cy

5 < ^ " ^ k V

^ 1 ^ " " ^ " ^ - - . - ^

X < 2

v f ? ^ .^0"""'^ "Z

Stereo rectification: original image planes ^ i 4 2 and associated points pi and pz re-projected onto a common plane

XJ/jjXJ/j parallel to base line C1--C2 Note the epipolar lines I j , I j are re-projected as the common scanline I i , l 2 and are also parallel to the baseline.

Fig 4.3.1.1 Convergent to rectified stereo re-projection

At the additional cost and associated overheads of a third camera (i.e., trinocular vision)) or even a fourth camera, it is possible to eliminate the uncertainty associated with a one-dimensional search. See for example Ayache and others [Aya91], [Jar94].[FP02] ). Eariy trials using trinocular vision were evaluated (Wingate [Win99], also Appendix A) but later abandoned in favour of binocular stereo for computational cost reasons. An interesting comparative study on disparity analysis based on convergent (i.e., non-rectified) and rectified stereo was carried out by Schreer et al [SBKOO]. They concluded that each method had additional overhead costs that needed to be considered prior to deciding whether the advantages of implementing rectification outweighed these computational costs.

4.3.2 Modelling and Recognition Software Environment and Associated Tools

A compelling reason for using the TINA Machine Vision Research Environment is the diverse range of software tools provided. The following modules were found to be of particular value:

(i) Calibration Tool (ii) Edge Tool (iii) Stereo Tool (iv) Matcher Tool

Chapter 4 Background on Models and Notation 58

Brief details of these tools are discussed in the following sections, greater details are available in [Tin97a], [Tin99] from which much of this description arises. Details of supplementation to these tools, which is considered an original contribution of this work is outimed in Ch5.

4.3.2.1 The Calibration Tool.

This tool is provided for estimating the intrinsic and extrinsic camera parameters of a stereo vision system, (the widely applied pin hole camera model is used). A calibration tile is utilised and several altemative calibration algorithms are available. These are briefly mentioned below. Arbifrary parameter selection and adjustment, radial distortion parameters, estimation of calibration parameter covariances, for purposes of optunal combination, are features also provided.

Calibration Methods The following calibration procedures are included: Tsai. This widely used routme, first developed by R.Tsai [Tsa87] is initially used on each camera, and while retuming satisfactory parameter estimations, improved results may be obtained by additional processing (i.e., following Tsai with IP min etc).

IP min. This is a direct image plane minimisation routine that operates on the difference between the observed and predicted locations of selected data. An iterative minimization algorithm relying on initial estimate values is used. Left and right camera calibration is performed separately.

IS min. This routine also performs dfrect image plane minimisation of the difference between the observed and predicted locations of selected data. Left and right cameras are calibrated simultaneously, unlike the routine above,

EPI min. This routine performs an off epipolar minimization. The routine retums the left and right camera models that are most consistent with the observed stereo geometry. A limitation of the routine is that only information on the ratio of the two focal lengths and the relative fransformation between the camera coordinate systems up to a scale factor is possible. Furthermore, a covariance estimate from a previous calibration must be used to constrain the minimization process. Additional supporting covariance estimation routines are also provided with the tool.

The procedure for calibration of a stereo rig requires a calibration tile to be placed in front ofthe two cameras, inclined at approximately 30 degrees to the cameras focal axis, and fully visible in both images. These images are then pre-processed with the Caimy operator (Edge Tool). Using an initial estunate ofthe pin hole camera model, the Tsai algorithm is used to obtain an improved estimate. Following this calibration, residual distributions can be examined and re-calibration appUed as necessary, using any ofthe above routines until good results are obtained.

Fig 4.3.2.1 illusfrates the process of calibrating the stereo cameras using a planar tile. Accurate location and matching of all comers is essential for good parameter estunation. The "adequacy" of results can be visualised by performing a 3D reconstruction ofthe tile itself.

Chapter 4 Background on Models and Notation 59

4.3.2.2 Stereo Tool

The fimction of the stereo tool is essentially input/output generic file manipulation and display fimctions for stereo image data and derived geometric features. It controls 3 display tools, one for each of the left image, right image and resultant 3D stereo matched line/coiuc (i.e., wfre frame) data obtained following processing with the Edge Tool.

4.3.2.3 Edge Tool

This tool is used primarily as an interface for performing edge detection, and edge based stereo processing. Its main application for this work has been in the recovery of 3D wire frame geometrical scene descriptions used in conjunction with the stereo tool. It utilises the following procedures:

Canny edge detection: This edge detection routine comprises of a two-stage process: In the first, edges with intensity gradients above a low threshold are identified, while in the second the edges are linked into strings. (Edge strings that are entirely below an upper threshold are eliminated). Final linked canny edges are obtained, to sub-pixel accuracy.

Edge rectification: The position of the edge data, obtained from use of the Canny edge extractor, are recalculated to be consistent with parallel camera geometry. The process is followed with stereo matching, and 3D geometrical recovery.

Stereo: An effective matching algorithm, derived from the "PMF" stereo algorithm [PMF85] is used, which relies strongly on support provided by local disparity gradient consfraints. It utilises the list edge-string structures provided from Canny edge extractor. Refer to section 6.2,1 for details of TINA geometric list structures,

Geoni2 and Geoni3: Standard polygonal algorithms are applied to matched linked-edges to identify and obtain 2D and 3D Une and conic structures respectively. The resulting 3D geometry is with respect to the left camera (virtual parallel camera geometry) coordinate frame, the origin of which is at the optical centre.

Chapter 4 Background on Models and Notation 60

Size Tj

install )

rect: def i

Mouse '

clone

ne/s

-)m r)

) '"it

^ow/ null

ProJ ^j

) repaint )

Size r j House v) ROI ^ j ProJ ^ )

install J clone ) inlt ) repaint)

rec t :de f ine / s h o w / null

D X

Size -J Mouse '-j ROI ^jl ProJ r j

Install j clone J init j repaint )

rect: define / show / n u l l

MiS (a) Left image of calibration tile (b) Left & right image comers found

-^f:msami^mm • x Size ^J Mouse ^J ROI ) ProJ -)

install) clone j init j repaintj

null

1 i_J 'i__j.

1

Size ^) Mouse ~j ROI j ProJ ^)

install) clone ) init J repaint )

null

' -—-

Size rJ Mouse -j ROI -j ProJ ^)

Install J clone ) inlt ) repaint J

null

(c) 3D reconstruction of calibration tile

(d) 3D views of calibration tile rotated to verify accuracy of planar reconstruction.

Fig 4.3.2.1 Calibration of stereo cameras using planar tile

4.3.2.4 Matcher Tool

This tool was designed to provide 3D model creation and matching facilities, but at the time of initial introduction (1999), it was only partially operational. N. Thacker [Tha99] in consultation with M.Wingate, contributed by removing a number of limiting programming bugs, and the tool was subsequently modified and added to by M.Wingate, as part of this work, to incorporate a 3D object-model builder and recognition module. Greater detail of this work is given in chapter 5, The model matcher algorithms provided and modified, are used in the recognition module both

Chapter 4 Background on Models and Notation 61

to perform 3D matching of object-model features to scene-model features and to compute the transformation which takes the model mto the scene.

4.3.2.5 TINA Generic List Structures

A significant feature of TINA's system libraries is the consistent use of generic list structures to provide coimectivity amongst the various data types. These provide the means of building software with good modularity, regularity and robustness. Two basic representative list stmctures are used. These are (a) the singly connected list structure and (b) the doubly coimected list structure. Each type may be applied recursively. Doubly connected lists, indexed by thefr start and end are termed strings and are a widely used structure to store and process segmented image edges (edgels). Five 3D geometric stmctures frequently used in this work are vectors, pomts, lines, planes and conies (curve geometry i.e. arcs, ellipses, cfrcles etc). Detailed code of these structures is given in section 6.2 where thefr usefiilness in the development of a "3D Model Editor", is more explicit.

4.4 Conclusion

This chapter has outlined the mathematical notation and models used throughout this thesis. An important outcome was the development of a stereo head model that enables feature measurements to be referenced to a head frame that remains stationary with respect to the mobile base reference frame. Model motion equations were developed for the mobile base, and a summary ofthe models and methods used in the vision research platform "TINA" was described.

hi Chapter 5 the motion equations for the mobile robot base are extended to include the robot's estimated position vector fv and confrol vector u. These together with the camera and stereo head formulations, developed in sections 4.2.1 to 4.2,5 are used to describe a "Simultaneous Localisation and Map Building" algorithm suitable for autonomous navigation, utilismg active vision.

Chapter 5 Navigation 62

Chapters Navigation

5.0 Introduction

This chapter commences by restatmg the preliminary mobile robot model equations of motion developed in chapter 4, and then describes a Sunultaneous Localisation and Map-buildmg (SLAM) algorithm suitable for autonomous navigation ui previously uncharted environments. The algorithm utilises the well known Extended Kalman Filter and is based on the map-building and sequential localisation algorithm developed by Davison [Dav98], [DavOl] for the Oxford GTI Mobile Base, and the collaborative work of Wingate and Davison [WD02a], [WD02b]. For this work, in its current phase, navigation in a totally unknowoi envfronment is not requfred. The system is intended to be capable of navigating to given sets of world coordinates where there is some likelihood ofthe targeted object being in relative close proxunity, (e.g. to a table in a room, with recognition of the table not being a requirement). A limited obstacle avoidance module based on visible feature detection has been implemented to enhance navigational reliability. If an obstacle is detected, information relating its distance from the robot is retumed and so movement around the obstacle can be scheduled. A limitation in the current implementation is that many obstacles do not have detectable features. For greater robustness in obstacle detection the fiiture addition of a sonar sensor mounted on a miniature pan/tilt platform is plarmed. Details of the design and constmction of the motorised pant/tilt platform are given in Burley and Wingate [BW02].

In general because of possible wheel slippage, calibration and measurement errors, the position of tiie mobile-base, calculated from odometry, can only be estimated to a degree of uncertainty. Appropriate modeling can ameliorate location uncertainty due to noise and measurement errors. For instance, wheel-velocities vi and vi (subsequently used as control input parameters) may be modelled assuming their errors to have a Gaussian variation with standard deviations proportional to the velocity demands themselves. Values of ai=0,14 vi, and 02= 0.14 V2 were found by trial and error to give good results. These expressions for uncertainty are incorporated in position estimation modeling m the derivations to follow. For circumstances where severe wheel slippage or obstacle collisions occur, additional positional information to that available from dead reckoning is essential for reUable robot self-localisation estimates.

In this work additional positional information is derived via the use of active stereo vision that acqufres landmark visual features, which in tum are used for map-building of the local envfronment, and self-localisation. A feature map is a powerful tool when navigating in either previously explored or unexplored environments. For the situation of navigating in new environments, provided suitable features can be found from the commencement of motion, these features can be used for improving self-localisation estimates, as well as map building. Once several features are available for tracking they substantially reduce the risk of becoming lost, in the same way as a land-map can assist a human in fmding their bearings. Known mapped features (i.e., features mapped during a previous voyage through the terrain) can likewise be helpfijl in the prevention ofthe loss of bearings.

Chapters Navigation 63

5.1 Preliminary Mobile-Base Model Equations Revisited

Referring to Figs 4. 2.6.1/2/3 m Chapter 4, the following equations were arrived at and are reformulated here for clarity:

For constant time intervals At (see equ. 4.55), the change in angular orientation 0 is obtamed by: VI - V 2 ^

e = At 5.1 2 D ,

Ifthe mobile robot's position at timet is: x^ = x(t)

\<Kt)j

5.2

Following a constant time increment At the robot's estimated new position will be:

z(t+At) = z + R( sin ((t)(t)+e ) - sin((|)(t))) 5.3

rearranging to separate out variables ^ (t) and 9

z(t+At) = z(t) + R(cos(t)(t)sin0 + sin(t)(t)(cose-1)) 5.4

x(t+At) = X + R(cos ((|) (t) - cos ((!> (t) +8)) 5.5

rearranging to separate out variables (|)(t) and 0

x(t+At) = x(t) + R(sin(t)(t)sme + cos<t)(t)(l -cosB)) 5.6

and

(t>(t+At) = (l)(t) + 0 5.7

when Vl = V2 =v (i.e., sfraight line case) 9->0 and R ->co and these equations can be written m a simplified form

z(t+At) = z(t) + ((vi+V2)/2)Atcos(t)(t) = z(t) + vAtcos(j)(t) 5.8

x(t+At) = x(t) + ((vi+V2)/2)Atsin(|)(t) = x(t) + vAtsin(|)(t) 5.9

(l»((t+At) = (t)(t) + ((vi-V2)/(2Di)) At = (t)(t) 5.10

Chapter 5 Navigation 64

Defming the robot's estunated position vector fv and the input control vector u as:

fv = x(t + At)

[^(t + At) u =

S ^ ^ 2 ;

5.11

fv is a fimction of both Xv and u (refer equations 5,8-5,9) and will be normally be designated as fv(xv,u).

The covariance matrix Q of fv expressing the first order uncertainty in fyis:

Q=vrfv)„uvrf,): Where

5.12

V(fv)u =

V(z(t + At))^^ V(z(t + At)),^

V(x(t + At))^^ V(x(t + At)),^

V(tP(t + At))^^ V(<P(t + At))^^

V = 0

0 v2

5.13

U is the covariance matrix of control vector u, and a and a are standard deviation estunates ofthe *'i 2

errors in the velocity control input parameters vi and V2. V(fy)„ is obtained from:

V(fv)u =

V(zj^v(/?),, +v(z)^v(^;,, v(z)^v(i?;,^ +v(z)^v(^; V(^;^V(i?),, + V(x)^V(^;,, V{X)J,V{R),^ +V{x)e^{0),

" 2

2

v(^;v, v(^;v,

Where

V(R)^^=-.2^lJ^, V(7 ) = ^^^^^ rv,-vj r^i -^2)

(' , =TTr-^^ vr6/;,, = - - ^ A ? 5.14

2D, 2Z),

'^(Z)Q = R( cos (j) cos 6 - sin (j) sin 6) ^(z)^ = ( cos (f) sin 6 - sin (f) ( cos 0-1)) 5.15

Chapter 5 Navigation 65

V(x)g^ =(sin^sin 0 + cos <l>(\ - cos 0) ^(X)Q = R(sin<t)cos0+cos(f>sin0)

's/{z{t + M))^ W{z{t + At)), V{z{t + At))/

Wy)x..= ^i<t + ))z V{x{t + At)), V{x{t + At))^

y{tP{t + At)), V{^{t-^At)), V{,/>{t + At))^_

'l 0 -R(sm (^t) sin 0 + cos <^t)(^ cos - l)f 0 1 i?('cos^jf)sin^-sin^ir;n-cos6!); 0 0 1

For sfraight line travel v = vi = V2 the previous associated expressions sunplify to:

V(z)^^ = Atcos , V(z)^^ = Atcos ip V(x)^^ = At sin t^, V(x)^_^ = At sin

W^A, =1^^' ^(^K = - ^ ^ ' while Vrf,)„ and V(f,),^ sunplifyto:

5.16

5.17

Wfv)„ =

At cos (p At cos At sin At sin <j> 0 0

and

5.18

V(fv)x. =

1 0 -vAfsin<^(0

0 1 vAtcos^{t)

0 0 1 5.19

respectively.

5.2 Landmark Features For Map-Building and Navigation

For a vehicle relying on active vision to navigate autonomously in an uncharted environment, a methodical map-building process to include surrounding stable landmarks is essential. This map can then be used to calculate the location of the vehicle with reference to a convenient reference point.

Essential for any feature-based active vision navigation system is a reliable feature detector and discrete landmark features that are easily detectable from a wide range of locations and angles. In this work, feature detection for map building and localisation is performed automatically using the Z operator described by Shi and Tomasi [ST94], The Z operator locates patches in an image that are easy to frack due to their large intensity variations across a patch, the associated feature is the cenfre pixel ofthe patch.

Chapter 5 Navigation 66

2 = 1 patcti

SrS

S yS X 5.20

where gx and gy are the horizontal and vertical gradients of image pixel intensities.

Image patches of size 13 xl3 pixels are used. Being larger than requfred makes them readily distinguishable and thus more reliable, albeit at the cost of greater computational overheads. Patches usmg the equation for Z above are calculated about each pixel point. Eigenvalues ^i and X2 are next calculated and patches classified in ascending orders of interest accordmg to the values of the X pairs. Optimal feature patches are those with the largest small eigenvalues of Z, The significance of the smaller eigenvalue being large, is that the patch has high variations in intensity making it more visible and thus making the point feature easier to track.

5.2.1 Feature Detection and Matching

To perform a measurement on a feature requires that it first be located in both the right and left image. In the sections that follow, it is shown that it is possible to ascertain a region in the left and right unages comprising an ellipse within which a sought feature should lie with a high level of probability. Restricting the search to this ellipse considerably reduces the search overhead. To fmd a required feature, scanning in raster order of both the left and right image is performed and for each pixel in the ellipse region, a patch is calculated and compared with saved patches that represent target features. The comparison is performed using equation 5.21, which represents a normalised sum-of-squared-difference measure.

N'^Y. patch

gx - gx go g (. cr^

5.21

go and gi are image mtensity values corresponding positions in the saved patch and image patch respectively, and go' i' o ^*^ ^1 ® means and standard deviations ofthe intensities across the patches, n is the number of pixels in a patch. This expression is 0 for a perfect match, and measures the average difference in standard deviations of values above the patch mean intensity between corresponding pixels in the patches being compared. Davison found a threshold value for N of 0.9 to be a suitable choice (see [Dav98, page 45]. For consistency this manually set threshold value for N was also used (i.e. the lowest value for N below 0.9 is accepted as the best match in the search zone). Note that any patch representation is intrinsically viewpomt variant, as features look different when viewed from new distances or angles, A criterion for expected visibility of the featiue based on the differences between the original viewpoint and a new viewpomt, is formulated below

Chapter 5 Navigation 67

The feature is expected to be visible ifthe length ratio ——— lies in the range 0.7 to 1.4, and the |m„rigl

angle difference /3 = cos "' (—'" "'^ ) in the range 0 to 45°. |ni||m„. I I 11 ong I

Scene Feature

Original New position position

Fig 5.2.1 Scene feature as seen from two viewpoints

5.2.2 More elaborate feature types

Rather than representing scene feature points as a two-dimensional image patch, some authors [ZF94], [Ruc97] use planar regions in 3D world coordinates. Many real features would fit this planar description (i.e., wall marks, door panels etc). An added overhead with this type of feature is the need to obtain an accurate transform from world plane to image plane. This transform would ensure the image plane obtained varied in shape and size appropriately with each new view location, giving greater confidence in matched features. With small inter-occular base lines, obtaining accurate transforms is however difficult and is not used here for this reason. Other possibilities include line features and comer features. Fixating on either of these however is more difficult due to possible line fragmentation.

5.3 Feature Acquisition and Fixation

Fixation is the process whereby the optic axis of the camera passes through the scene feature point. Fixation ensures image processing occurs near the centre of the image where lens distortion is at a minimum, and also minimises the likelihood of part ofthe feature lying outside the bounds of the image plane. Acquiring a new feature consists of finding the best patch in the left-image and then fixating on the feature by moving the head so the optic axes of both left and right cameras pass through it. The process is as follows: once the left image patch has been located, the corresponding epipolar line is generated in the right image from knowledge of the head angles (camera motion). The epipolar line in the right image, is the image ofthe light ray on which the feature must lie. A match in the right image is sought in close proximity to this line (typically within ±7 pixels of the line), and if successful the 3D location of the feature is

Chapter 5 Navigation 68

calculated as outlined in section 4.2.3. From the 3D location, the head angles required for fixation are calculated and the head positioned accordingly. If the feature is located within a small radius (i.e., 2 pixels) of the principal point in both the left and right image, fixation is successful; else the feature position is recalculated and the process repeated. If after several attempts fixation fails, the feature is discarded.

5.3.1 Uncertainty of Fixation Measurements

Camera image plane

Scene image point V

Scene feature

Principal point • vo

Optic axis

Fig 5.3.1,1 Uncertainty of fixation measurements

Fig 5.3.1.1 shows a feature point with a small error in fixation. An accuracy of ±1 pixel would normally be expect in image measurements of detected features. To determine how this error is translated into angular errors in fixation, the vertical difference between v and vo in terms ofthe angle error e is examine as follows:

tanf = V-Vo

Chapter 5 Navigation 69

Where/is the focal lengtii and ky the vertical pixel density m pixels/mefre.

The effect of a small change 8v in v can be ascertained by differentiating the above equation as follows:

2 i: ^

sec 806 =

At fixation 8 will be small, approaching zero consequently sec^e tends towards 1 thus:

=> Ss =

For a fixation with an accuracy of 2 pixel (i.e., dv=2) ,fkv=807, S£^2/{S01) «0,0024rad ^C.15 degrees.

Head movements are repeatable to within the order ±0,1 degrees, consequently errors introduced by head movement motion are negligible in comparison to Se,

Essentially the implication of the previous analysis is that an angular uncertainty value of ,0024 rad must be added to all head fixation angle measurements. The effect of this uncertamty on 3D scene measurements can be quite significant when depth is large as shown in Fig 5,3,1,3. Error values were obtained using Fig 5.3.1.2 which shows the stereo cameras having an inter-occular spacing I metres, angular vergence uncertainty e rad, fixation angles YL= -YR, angular fixation error 8 rad =Ay rad, error in Z of AZ=Bd, error in X of AX=Ad,

Consider triangle ABC, Angle B = YL. Angle C = YL - e, Angle A = 7t- (B+C) = 7t - 2YL + s. The error in Z, is AZ = Bd, while the error in X, is AX = Ad, SideBC = (tan(B) - tan(C))/(I/2), Using tiie sme mle SideAB=SideBC*sin(C)/sin(A), AZ= Bd = sideAB*cos(B) and AX= Ad =AZ*tan (B)

The significance of these uncertainties with I = 0,25 m (the mter-occular distance for our BiSight head) and vergence uncertainties of 0,005 rad is highlighted in the graphs of Fig 5,3.1.3.

Chapter 5 Navigation 70

Fig 5.3.1. 2 Converging stereo cameras showing potential measurement errors in X and Z

Examination of Fig 5.3.1.3 reveals that transverse errors (AX) are quite small, reaching a maximum of 0.04m at a depth of 10m, whereas the corresponding uncertainty in depth estimations (AZ), is in the order of 3.1m. This is not unexpected, as for large Z values the camera optical axes are near parallel, and consequently small uncertainties in the vergence angles translate into large depth uncertainties. An ellipse with (AZ) as the major axis and (AX) as the minor axis, is a usefiil pictorial way of portraying these uncertainties. Accounting for them during localisation and map building is covered in the next section.

Chapter 5 Navigation 71

1 '^

3

2.5

2

1.5

1

0.5

n

-

-

-

u 0 1

2 3

1

4

1

5

1

AZ

6

1 1

AX

7 8

I

-

-

-

9 10 Depth (in)

Transverse errors (AX) are quite small, reaching a maximum of 0.04m at a depth of 10m, whereas the corresponding uncertainty in depth estimations (AZ), is in the order of 3.1m.

Fig 5.3.1.3 Fixation error estimations for I=0.25m and vergence uncertainties of .005 rad

5.4 Navigation: Map Building and Localisation

To be reliable a map-building and localisation process must account for the uncertainties due to noise in any measurements made. An effective tool that incorporates the modeling of noise and provides optimal estimations of location, is the Extended Kalman Filter. As with earlier authors [New99] [Cso97] [CDTOO], MROLR's map-building and localisation algorithms use an Extended Kalman Filter.

5.4.1 Extended Kalman Filtering

Considerable literature exists on both the Kalman Filter (KF) and Extended Kalman Filter (EKF), see for example [Sor85], [Gel96]. The KF addresses the general problem of trying to estimate the state of a linear discrete-time controlled process in the presence of noise. The EKF is an extension ofthe KF to permit its application to non-linear systems with state transitions and whose fimctions contain non-Gaussian noise.

Chapter 5 Navigation 72

5.4.1 Notation

The following briefly outiines the notation used for the Extended Kalman Filter. Further explanations accompany the subsequent formulations.

State transition function f: The state fransition fimction is designated by f. This fimction incorporates the system dynamics and facilitates the estimation of the "current state" during a time step At during which no measurements are made.

State vector x: The state vector models the system, it contains the best estimate of quantities of interest as the system evolves with time. Importantly it incorporates any measurements (z) ofthe quantities made during a time interval At. The expected evolution of the state vector with the passage of time is encapsulated in f, the state transition fimction.

Current state estimation vector x: The current state estimation is stored in this vector and the covariance matrix P.

Covariance matrix P: The covariance matrix represents the uncertainty in x due to noise Q m the state transition. P is a square symmetric matrix, with a dimension equal to the number of elements in x

Noise Q: This variable provides a means of allowing for random or unaccounted effects in the dynamic model.

Measurements z and m: z is used to designate an actual measurement (i.e., using the active stereo head), and m for a prediction ofthe measurement.

it (k+l/k): An estimate x of the state x at a time step k + 1 , based on the estimate at tune step k and an observation (measurement) made at time step k + 1.

Innovation v: The innovation v is the difference between the actual measurement z and the predicted measurement m calculated from the current state.

Covariance matrix of the noise R: This matrix represents the covariance of the noise in the measurement.

Gain W and innovation covariance S: W designates the Kahnan gain and S the innovation covariance. The innovation covariance represents the imcertainty in v (i.e., the amount by which an actual measurement differs from its predicted value).

Chapter 5 Navigation 73

5.4.2 The Extended Kalman Filter

Prediction hiitially, m a step of time At m which no measurements are made, an estimate of tiie state ofthe system x may be obtamed from the state transition fimction f and the system dynamics as follows:

x(^+l/^) = f(x(M),u(^)) 5.22

Where u is the vehicle's control vector specifying the wheel velocities and indfrectiy the steering angle (see equation5.1).

¥(k +l/k) =V(f)Jk/k)V(k/k) V(f)^(k/k) +Q(k) 5.23

During the state fransition the covariance matrix changes reflect the increase in uncertainty in the state due to noise Q. f and Q both depend on u, the current mobile-robot control vector which is a fimction ofthe wheel demand velocities vi and V2.

State Update

Followmg a reliable measiuement, the state estimate will improve with this new information and the uncertainty represented by P will reduce. In the following update equations, the innovation v is the difference between the actual measurement z and the prediction m calculated from the current state. R is the covariance matrix of the noise in the measurement, W the Kalman gain, and S the innovation covariance, which represents the uncertainty in v (i.e., the amount by which an actual measurement differs from its predicted value).

X {k+Vk +1) = X {k+l/k)+W{k+l)v{h-l) 5.24

P {k+Vk+l) = P {k+Vk)-W{k+l)S{k+l)W\k+\) 5.25

Where the iimovation v (the difference between the actual measurement z and the predicted measurement m) is obtained from:

v(A+l) = z(A+l)-m(x {k+l/k)) 5.26

The Kalman gain W is given by:

y^(k + l) = ?(k + l/k)V(m)l(k/k)S'fk + l) 5.27

and the innovation covariance S (which represents the imcertainty in v) from:

S(k + l/k) = Vf m), (kA) V(k + \/k) V(m)l(k/k) + Rf/c + \) 5.28

Chapter 5 Navigation 74

5.5 Localisation and Map-Building

5.5.1 Formulation ofthe State Vector and its Covariance

At any given time, estunates of mobile-robot and scene-feature locations (in the world reference coordinate frame) are stored in the system state vector x and the corresponding uncertainty of these estimates in the covariance matrix P. Equation 5.8 shows these in partitioned form; x is the mobile-robots position estimate, and y, the estimated 3D location of the rth feature. The dimensions of both vector x and matrix P change as scene-features are added or deleted. Note that for brevity frame subscripts have been dropped.

x =

^x ^

Si

\ '• J

p = ' y\^

' yix

' m ' y\y\

' yzy\

'xy2

' yiyi

' yiyi

5.29

Where

x =

f

K^. . y ,=

/ " \ ^ ^ , ^

V 'J

5.30

Note X comprises 3(n+l) elements, while P is of size 3(n+l) x 3(n+l), where n is the number of mapped features. The dimensions of both x and P change as features are added or deleted,

5.5.2 Initialization

The filter's mitialization occurs when the mobile-robot is m the starting position (f = 0, X = 0, ( = 0) at the origm of the world reference frame and aligned with it, and no known scene-featiu"es have yet been acqufred, Smce the state is knovm with certainty:

X =

ro] 0

l o . p =

0 0

0

0 0

0

0

0

0

5.31

Chapter 5 Navigation 75

5.5.3 State Estimation following a Movement

A new estimate of state i and covariance P, followmg movement of the mobile-robot over a constant tune increment At, designated by the label k is:

x(k + lA) =

(f/k/k).u(k)^ y,(k/k) y,(k/k)

5.32

? (k + l/k)

w.)x.^xxOc/k)ax+Q(k) w.)x.^^,m) (f.)x.K.(k/k) " KJk/k) yiVi

V....,(k/k) yiy2^ 5.33

Where fv and Q(k) are defined in Equations 5.11 and 5,12,

(Equation 5,33 is obtained from f(k + l/k) = V(i)^V(knc)V(i)l +Q(k). Note S7(f)^ is tiie fiiU state transition Jacobian (equation 5,34)).

vrf),= WJx, 0 0 0 I 0 0 0 1

5.34

Chapter 5 Navigation 76

5.5.4 Measurement and Feature Search

Recall that both the mobile-robot and scene-feature positions are specified in world reference coordinates, however scene-features are physically measiu^ed relative to the Head cenfre. The tme zth feature position (relative to the Head) is thus given by:

W^ m.

Gix

CO Giy

CO

(Xj -x)cos^-(Z. -z)sin(p

Y.-M,

(Xi -x)sin^+(Z. -z)cos^

5.35

An estimate of the rth feature position at the current mobile base location is obtained by substituting estimated values for the above variables (i.e., x ,and y,).

Following a measurement of this vector, the innovation covariance is:

S _ . = V r m G , ) , P V r m G , ) ^ + R i mGi

Vrmc , ) . ^P , ,Vrmc , )^^+2Vrmc , ) , ^P ,^ ,Vrm^ , )^ ,

+ VrmG,)y.P,,y,.Vrmc,)J^,+Ri

5.36

Note in equation 5.36, the measurement noise RL, is transformed into Cartesian measurement space. For fransformation into angular form see section 5.5.5

The process of measuring a scene-feature is as follows: from the above estimated relative scene-feature location, the head angles necessary for fixation are calculated (see Section 5.5.3) and the head moved to them. The vectors m^ and m^f and their covariances P„^ and P„^ are next

calculated from transformed Smci- At fixation both m^ and m j will have zero x and y components as the camera's optic axes pass through scene-feature.

For the left camera:

" i = - A ^ + " o and V i = - A % + V o m Lz m Lz

5.37

The establishment ofthe covariance matrix ofthe image vector u^ =

u. = vruj„,Pn.,vruj:^

u is obtained from \^L.

5.38

Chapter 5 Navigation 77

The Jacobian Vfu^ j„ contains mostly zero values since ^ = ? ? ^ =0

vruj„, =

-Au 0 0 m Lz

0 - ^ ^ 0 m,

5.39

UL defines an ellipse m the left image whose size is govemed by the declaration ofthe number of standard deviations. This defined area can be searched to locate the patch representing the scene-feature in question. The procedure is repeated in the right image. Restricting the search areas in this manner is computationally efficient as well as minimizing mismatches. Following a

CO successfiil feature match in both images the final measurement of m^, is calculated

5.5.5 State Vector Update following a Measurement

There are considerable computational savuigs if prior to processing a measurement of a scene-feature, it is transformed into angular coordinates (see Davison [Dav98, page 68]). This fransformation of m f , resulting in measurement vector m, is given in equation 5.40,

m, =

^a ^

Vri)

tan moix

tan

moiz _i maty

tan

moip

I Imai)

5.40

Where mci is the scalar length of vector mci and map = J(mGix^ + motz^ is its projection onto the xz plane. I is the inter-ocular separation ofthe head. These angles represent the pan, elevation and vergence angles respectively of an ideal active head positioned at the head centre and fixating the feature (i.e., head offsets =0).

The advantage obtained by the use of angular measurement is that it allows measurement noise to be represented as a constant diagonal matrix. For a successfiil fixation, lock-on is achieved if the feature is located within a specified radius from the principal point (typically two pixels) in both images. For a specified radius of two pixels his represents an angular uncertainty of around

0

0,3 (see section 5,3,1) and hence errors with this standard deviation (assumed Gaussian) can be assigned to ai, e/, yi. (i-e,, Aai Ae/, Ayi). The measurement noise covariance matrix R is given in Equation 5,41, and it is a diagonal matrix in which ai e,- yi, are mdependent variables, making it simpler to represent measurement noise as Gaussian, consequently eliminating potential bias during filter update. In addition the measurement vector mi can be decoupled, permitting scalar measurements to be used to update the filter. Computationally this has the substantial advantage that m updating the filter, matrix inversion is not required.

Chapter 5 Navigation 78

R =

Aaf- 0 0

0 Zk 0

0 0 lAy' 5.41

The Jacobian for each scalar part ofthe measurement m, for the current scene-feature is:

Where scalar component mi is one of ai e, yi. The scalar innovation S is calculated from equation 5.42 in which P ^ ,^ ^. ""d P „ are 3 x 3 blocks of the current state covariance matrix P,

^ , > ' , • > ' , •

9 7 9

and R is the scalar measurement noise variance (Aai , Ae, or Ayi .) of the measurement. The Kalman gain W is now calculated by equation 5.43, enablmg the filter to be updated (equations 5.44 and 5.45). In equation 5.44, Zj is the actual measurement component obtained from tiie head and mi the corresponding prediction component. S=S/(m,),'PV(mX+R

=W^)Ax'^(^/.+2V(mJ,V,^V(mX^ +^(mJ^^^^^^W(mX +R 5.42

/ p A

yi'' V r m , A , + 5

rp ^ yi",-W = PVrm,.;,5-' = 5- '

'^3'2'' ^ y 2 y

Note in equation 5.43 smce S is scalar S'' =1/S.

Zi is the measured quantify from the head while mi the predicted

K. = P./. - wsw'

v r ^ , ) , y,-5.43

5.44

5.45

5.5.6 New Feature Initialization

For first tune observation of a new scene-feature i, the vector mci relative to the head centre is obtained and the state ofthe new feature yi, initialised as:

Chapter 5 Navigation 79

y , = Y, X^iJ

X + m gi^ cos (j) + m g,^ sin (p

^ h+ m^,^

z - m e, sin p + m <,„ cos (j)

5.46

The total state vector x and covariance P are updated to:

X = new

^x.^

Yl

y2 5.47

P-„ = y\^

yi^

•9'l

y\y\

yiy\

' 'Vl

ym

' yiyi

V(y,X,P. V(y,.X P ^ V(y,X^P xy2

P.V(y,.):^

P>3.V(y,):^

P...V(y,.):^

V(y,X^P.V(y,.X>V(y,X^R,V(y,X^ 5.48

5.5.7 Scene-Feature Deletion

Deleting the rth scene-feature simple requires its removal from the state vector, and rth row and column from the covariance matrix. The example below applies for i = 2.

^x ^

yi

y2 - >

^x ^ -*-v

yi and

Chapter 5 Navigation 80

w

y\yi yi"

P P yi" yiyi

P P * yi" yiyi

•-"iVl

yiyi

yiyi

yiyi

'Vi

ytyi

' yiyi

yiyi

- >

• ^ 1

ym yi"

P P yi" yiy\

^ 3

y\yi

yiyi

5.49

5.5.8 World-Coordinate Frame Zeroing

For cfrcumstances when it is more convenient to use the current mobile-robot position as the reference coordinate frame, this can be accomplished at any stage by simply zeroing the world coordinate frame and assigning the new state to:

X =

Xv new

y •/ new

y2 •/ new

V ''• J

^ 0

m c i + M ,

^ 0 2 + M H

V •

5.50

mp, is the vector to the ith feature from the head centre, and Mh the constant vertical offset vector from the ground plane to the head cenfre. The correspondmg state covariance and Jacobian matrices are calculated using equations 5.51 and 5.52

vrx„^ ;. old

0 0

0

0

0

^2

5.51

Pr«ew; = vrx_;_Po«vrx_;[^,^ new ^XQW

5.52

Chapter 5 Navigation 81

5.6 Map-Building

With the above analysis, the framework for estimating the location of the movmg mobile-robot and observed stationary features is now available. The trackmg of a scene-feature proceeds as follows:

Assume that a target feature has been fixated during a measurement phase ofthe filter. At the commencement of movement, the prediction step ofthe filter will provide an estimate of the new position of the mobile-robot on reaching the next measurement point, as well as the expected relative position of the target feature. Using this information tiie stereo head is driven to fixate on the feature m readiness for the next measurement at the end of the current time increment.

The actions required are: • Using current control inputs, carry out the prediction step and provide an estimate of the

robot's new position. • Select the feature for tracking and calculate the estimated head angles for fixation. Move

the head to this position in preparation. • Move the robot to the new calculated position based on its odometry. • Capture new images, • Perform feature matching in both images using correlation as in section 5.2.1 • If a match is found, carry out a measurement and update the filter.

It is sensible to make repeated fixated measurements of a single feature as the robot moves past, the issue is when to search for a new feature, or track an altemative feature. It is not difficult to envisage situations arising where a large number of features with full covariance need to be stored and maintained during navigation. The computational overheads of updating the filter at each step would significantly impinge on the filter's ability to perform real-time tracking. A solution to this situation that does permit continuous tracking without compromising the integrity ofthe filter is: during motion, update only those parts ofthe filter that are required for the current fracking task. A fiill update of the filter can subsequently be carried out when the robot stops moving at the next step, as ample processing time is then available.

The previous expressions lend themselves very efficiently to this type of update, because only those parts of the state vector and covariance matrix which are directly involved in the tracking process at that time need to be updated. These are the estimated states ofthe sensor and observed feature, x and y,. and the covariance elements P^ .P^y , and Py y .

The algorithmic development for the continuous tracking of multiple features is continued in Appendix C, There it is explamed that by storing a small amount of information at each fransition step, the state and covariances can be updated in a generic way at the completion of each tracking motion. A criterion is also provided as to which feature to frack during navigation, this is referred to as the "Vs criterion", and is based on known innovation covariance values of the features imder consideration.

In Section 5.7 further details ofthe map-building process, and the strategies adopted are outluied.

Chapter 5 Navigation 82

5.6.1 Distance Increments in Place of Time Increments

hi tiie precedmg analysis, events are govemed by constant tune increments of At. However, a more useful mcrement to trigger measurements is vehicle odometry (i.e., the trajectory being divided into steps of fixed wheel encoder coimts, for this work the step size nommally used is 20cm). Distance mcrements are preferable to tune mcrements, as these elfrninate accumulation of errors due to velocity fluctuations during a time mterval.

5.6.2 Advantage of Known Features

Incorporating Known-Features into the map building process has considerable advantages, akin to using recognised landmarks to remforce one's own position when travellmg in relatively unfamiliar territory away from home. The process of acquiring a known-feature consists of manually drivmg the robot to a suitable position and pointmg the head m the dfrection of a likely suitable feature (i.e., one with high contrast). If the feature is successfiilly matched and measured, its unage patch is saved. Several examples of such feature patches are shown in Fig 5,6,2,1. The saved patch together with its world coordinates and the robot's position are stored in a file of known-features. Known features are initialised into the state vector in the same way as naturally found features (i.e,, as feature i, having coordinates yi). The covariance P however is

set with all elements equal to zero, along with the cross-covariances between the feature-state and that ofthe vehicle and other features. This is justified as it is assumed that the location ofthe known feature is 100 percent certain.

Chapter 5 Navigation 83

(a)

(b)

Fig 5.6.2.1 Example of saved known features (a) Light fitting ends and (b) Fire extinguisher sign

5.6.3 Simplified Mobile Base Trajectory Motions

Motion of the mobile base is govemed by the control vector u = \^2.

and while the analysis in

the preceding sections places no restriction on the movement of the mobile base, motions comprising pure rotations and linear path trajectories are the simplest and most effective in practice. A pure rotation is accomplished by setting vi = - V2 while for tianslation vi = V2. In tests of the mobile base navigating to designated waypoints, the magnitude of the steady state velocity is set to a constant value with angular rotations and number of steps the only trajectory variables required to be calculated by the software controller. Generally, trajectories from one waypoint to the next consist of an initial rotation about the central axis ofthe base, a translation

Chapter 5 Navigation 84

dfrectly to the waypoint, followed by a fiirther rotation about tiie cenfral axis ofthe base, to align tiie it with the specified orientation. Occasionally when differences between odometric and position estimates, based on feature measurements exist, the assiunption is made that the odometry is in error and corrective movements are made,

5.7 Map-Building Strategy Adopted

The final map-building and fracking sfrategy adopted and the rational for the decision making process, are outlined in this section. These stem from the derivations detailed m Appendix C.

The map-building process commences when the robot is about to move for the first time. In searching for suitable scene-features, ui an effort to ensure a wide selection is fotmd, the stereo head is pointmg slightly up, and is first directed to the left, then straight ahead and finally to the right. In principal, during this search up to three new features could be found and added to the map. While this is unlikely in view of suitable features being somewhat scarce, the rule that a new feature be added if less than 2 features are visible at any given location has been adopted. This mle ensures a sparse map is created, fijrthermore in maintaining the map it is unportant tiiat obsolete features be removed. Features are deleted if out of 10 measurement attempts more than half are unsuccessful.

Which features are subsequently chosen for state vector updating, is determined by assessing the uncertainty of their position relative to the robot. This measure of uncertainty is obtained by comparing the scalar space volume measure Vs, which is related to the innovation covariance values of the feature. The principle adopted is that the 'best' feature to measure (for the long-term integrity of the map as a whole) is the one with the least certainty, as the measurement process will improve its estimated position, thus reducing the overall uncertainty of the state vector. For instance if a feature is knovm with zero uncertainty, a fiirther measurement of the feature cannot improve on the uncertainty of its position. While tracking a feature however, some cost needs to be attributed to saccading to a new feature as each saccade encroaches on the total navigation time.

The approach adopted is to decide while the robot is moving, whether to switch to a new feature or stay tracking the existing one. The choice is based on the predicted robot and map-state that would exist in a short fiiture interval of time if (a) the saccade and measure ofthe chosen feature was made or (b) tracking ofthe existing feature continued. The "short fiiture interval of tune" is the estimated time it would take to saccade and measure any ofthe features.

Formally the decision process is implemented according to:

1. Ni, the number of measurements that would be lost while saccading to each of the visible features is calculated. To determine this, an estunate of head movement time to correctly move to and locate each feature is requfred,

2 Ascertain the largest Ni (Nmax), this represents the largest number of measurements lost during the longest saccade.

Chapter 5 Navigation 85

3 If an immediate saccade is to be made, estimate for each feature i the state that would exist after Nmax + 1 steps,

4. Evaluate Vs(max) for each of the above estimated states and saccade to the feature with the lowest Vs(max) unless the Vs(max) ofthe feature being tracked is the lowest.

5.8 Software Development

5.8.1 Development and Modification of Navigation Software

As previously stated the development ofthe navigational software is based on SCENE [DavOl], The motion models outlined in this Chapter and Appendix C were developed for the TRC mobile base and Bisight head, and these replaced SCENE'S existing "default" models where appropriate. Furthermore, because of tiie trajectory simplifications outimed in Section 5,6.3 the "Confrol Parameter Section" ofthe interface GUI was able to be reduced to contain only the Displacement (m). Steering Increment (degrees) and Number of steps. (Velocity, Turret increments, and Time step slide adjustments confrols were removed).

In addition to the above and title name changes, the following functions and associated interface buttons were added: (a) Obstacle Avoidance (Navigate around obstacles), (b) Recognition

The interface GUIs are shovm in Figs 5.8.1.1 to 5,8,1,3.

5.8.1.1 Navigating to Specified Destinations (Waypoints).

The Waypoint Browser (Fig 5.8.1.3) holds the locations of sequential stopping destinations. Navigation to the next waypoint commences on the mouse click of button " Navigate to Next Waypoint" (Fig 5.8.1,2). In this case the route is assumed obstacle free. Scanning for target objects commences on the mouse click of button "Recognition" (Fig 5.8.1.2)

5.8.1.2 Obstacle Avoidance Function.

If obstacles are likely, then instead of clicking " Navigate to Next Waypoint", the button "Navigate Around Obstacles" is selected. With this selected, prior to the commencement of each step the head tilts down and checks for an obstacle. If one is detected within a depth of 1 mefre an additional waypoint is automatically inserted. Fig.5.8.1.4 illustrates this process

Since the search for obstacles occurs at each step, the time taken to navigate between waypoints increases proportionally to the distance ttavelled.

Chapter 5 Navigation 86

5.8.1.3 Dead Reckoning Option

A command line option has also been provided for use when neither mapping nor obstacle avoidance is required (i.e., SLAM not requfred). For this "Dead Reckonfrig Option", tiie base simply relies on its odometry encoder values for localisation. It is the fastest mode of path navigation, and is quite reliable when the terrain is a level floor. Flow Chart Fig 5.8.1.5 summarises the options.

MROLR ROBOT

n"

Grab images I Resici Images

Show Head Odometry

Feature Acquisition

Initialise Manually-Selected Point Feature

Initialise Automatically-Selected Point Feature

Fig 5.8.1.1 Head control GUI

Chapter 5 Navigation 87

W MROLR COM D X

Fig 5.8.1. 2 Navigation GUI

-M MROLR BROV/S D X

Waypoints

0 0.00 0,00 0.00 a 0.45 1.00 0.00 2 0.30 1.00 0.00

Add Before I Add After I Delete I Delete I

To add:

- SHSSSSI

Fig 5.8.1. 3 Waypoint browser GUI

Chapter 5 Navigation 88

Waypoint n (becomes n+1)

4 \ 1 Inserted

Obstacle^ ..i* waypoint (n)

Mobile H base M 1 dotted lines indicate

path travelled Waypoint n-1 •

If an obstacle is detected, an additional waypoint is inserted to the right or left side ofthe obstacle, depending on perceived space

Fig 5.8.1. 4 New waypoint creation for obstacle avoidance

Chapter 5 Navigation 89

(a) Choose target object-model from data base

(b) Input destination Z,X,(|)

Navigate to specified Z,X,<|) using robot odometry

No Yes

Activate SLAM module and navigate to specified Z,X,(j)

Activate recognition module, scan scene with active stereo head

Await further instructions

Yes

(a) Request docking of robot arm. (b) Following docking, drive arm to grasp object using object's estimated pose (c) Place object on vehicle

Fig 5.8.1. 5 Flow chart illustrating SLAM or dead reckoning modes

Navigate back home (using SLAM if SLAM module activated)

Chapter 5 Navigation 90

5.9 Data Measurement Examples

Examples of data measurements, for a map size of 2 features, at step 3 are given below. Note tiie number of significant figures printed do not reflect the accuracy ofthe data.

Step 3

No of features 2

State vector X —

0.884382 0.993516 0.0305666 1.04299 1.04177 6.24433 -1.04419 1.04236 4.1569

Covariance Matrix

p = [ 0.000472303 0.000224005 -2.74943e-05 8.00617e-05 5.45943e-05 0.000602076 -0.00023809 0.000196054 0.000953954 0.000224005 0.000289892 -2.07999e-05 0.000187995 0.000195976 0.00127189 -0.000198764 0.00020446 0.000867573 -2.74943e-05 -2.07999e-05 1.33371e-05 3.07865e-05 1.31344e-05 .2439e-05 5.32473e-07 3.16899e-06 6.16399e-06 8.00617e-05 0.000187995 3.07865e-05 0.00036703 0.000285048 0.00166124 -0.000203065 0.000229519 0.00093282 5.45943e-05 0.000195976 1.31344e-05 0.000285048 0.000701177 0.00211657 -0.000166389 0.000184586 0.000733035 0.000602076 0.00127189 5.2439e-05 0.00166124 0.00211657 0.0124287 -0.00109314 0.00117172 0.00475933 •0.00023809-0.000198764 5.324736-07 -0.000203065 -0.000166389-0.00109314 0.000313161 -0.000255231 -0.00107892 0.000196054 0.00020446 3.16899e-06 0.000229519 0.000184586 0.00117172 -0.000255231 0.000393117 0.000984229 0.000953954 0.000867573 6.16399e-06 0.00093282 0.000733035 0.00475933 -0.00107892 0.000984229 0.00420907

]

Eigenvalues associated with each feature measurement

0.000112516 4.36763e-05 4.2331 le-06 0.000110681 4.42953e-05 4.26137e-06

Best score found: 6.0545e-07. Auto-selected feature with label 1.

Covariance matrix of noise

R=[ 4e-06 0 0

0 0.0001 0 0 0 4e-06

]

Innovation Covariance Matrix

S = [ 4.42963e-05 -3.15737e-07 -1.4381e-07 -3.15737e-07 0.000110679 4.00568e-08 -1.4381e-07 4.00568e-08 4.2619e-06 ]

Chapters Navigation 91

5.10 Conclusion

With the navigational formulations in this chapter complete, the remaining algorithms that still require to be developed relate to:

• target object-modeling, • scene-modeling.

target object recognition, target object retrieval, using the docked robot arm.

Each of these tasks, together with the provision of a "3D Virtual Environment" for simulating MROLR, are considered in the following chapter.

Chapter 6 Object Modeling, Recognition and Retiieval 92

Chapter 6 Object Modeling, Recognition and Retrieval

6.0 Introduction

This chapter considers object and scene modeling, recognition, and retrieval ofthe located object from the scene.

Section 6.1 outlines an automated 3D object-model producing facility, designed to produce an object-model data base comprising all objects that MROLR is likely to be requested to locate and retrieve on command. Section 6.2 includes the development of a 3D object-model editor. Sections 6.3 to 6.5 outiine the development of the object recognition module that is responsible for matching the requested object with objects found in the scene. For a successful match, the appropriate tiansform (and consequently the object's pose) to transport the object-model into the scene is also provided.

Sections 6.6 and 6.7 deal with the docking of the robot arm, and the establishment of the additional transforms necessary to guide the arm towards the specified object's grasp-point and fmally retrieve the located object.

Section 6.8 discusses the development of a "3D Virtual Enviromnent" for sunulating MROLR.

6.1 Automated Model Creation

The recognition process involves the matching of an a prori specified object-model with a scene-model. Methods of building object-models were reviewed m Chapter 2 where it was concluded that for the current application, a 3D wire-fi*ame (edge-based) model was a suitable choice (section 2.2.7). Reasons for this choice of modeling are:

(a) Matching is carried out in 3D Eucledean space which ensures angular and linear measurement invariance (i.e., relative angles between Imes, and arc lengths remain fixed). This is essential, as the object in the scene will in general have an unknown orientation.

(b) Matching in 3D Eucledean space permits positional information to be calculated in terms of a homogeneous tiansform giving orientation and translation details ofthe matched scene-object relative to the model-object. This transform is available for transporting the model into the scene allowing visual verification of the quality of the match. It is also a necessary component for obtaining the resulting transform that guides the robot arm towards the object m the scene for retrieval.

(c) It requires only sparse data matching thereby keeping search and computational costs to a minimum. Given the large number of items available in any given scene, a non-sparse data modeling would be prohibitive.

Chapter 6 Object Modeling, Recognition and Retrieval 93

One obvious disadvantage of this modeling approach is the small number of available features for matching (i.e., lines and conies). Additional local characteristics such as object colour, texture and surface properties could be added to the list of features to increase the robustness of the matching process. The addition of these will be considered for future work when faster processors should become available.

The following briefly describes the need for models created from several views and then describes the methodology of the model creating process. If two images of a scene are captured using a pair of calibrated cameras, edge based binocular stereo triangulation can be used to obtain a partial 3D edge outline ofthe scene as depicted in the left (or right) camera image. This is of limited use for 3D recognition however as local features associated with the object (or scene) are only depicted from one view. Thus if a solid cube object (Fig 6.1.1) has letters 'A' printed on the front and 'B' on the back, the letter 'B' may not be visible from a particular view. If both letters were required for recognition, the block would not be identifiable.

Fig 6.1.1 Features remain static w.r.t object's frame

A 3D outline ofthe complete cube (i.e., hidden edges included) on the other hand, would ensure all its surface features were visible, and would thus constitute a suitable object-model for recognition.

To facilitate automated object 3D-model creation, a PC controlled rotating horizontal table and two cameras mounted on extended rails, was used to obtain a sequence of stereo images of desired objects (Fig 6.1.2). Two smaller portable PC controlled tables, described below, were also subsequently built and used.

Fig 6.1. 2 Models created on computerised rotating table

Chapter 6 Object Modeling, Recognition and Retiieval 94

From each stereo pair of images corresponding to a rotated view, a 3D partial edge outline was derived and then mtegrated to form a complete model. The coordinate frame for each view was (by default) with respect to the left camera's optical centre. Merging the independent views requires all re-constructions to be with respect to a unique (although arbittary) coordinate frame attached to the object. As the object rotates with the table, so too does its coordmate frame, thereby ensuring the object's physical dunensions and features remain static with respect to that frame (Fig 6.1.1). These would however change with respect to any stationary reference frame. The world reference was chosen so that its Z-axis coincided witb the table's central axis of rotation. The object's arbitrary coordinate frame is aligned with the world coordinate frame for a table rotation angle of zero.

As the fmal object's reconstruction is required with respect to its ovm coordmate frame, two transformations of 3D edgels are required. The first from the left camera coordinate frame to the static world reference frame, and then a sunple rotational transformation (about the world Z-axis) to tiansform the edgels to the object's coordinate frame. Using a calibration tile (Fig 3.1.2) whose Z-axis also comcides with the table's centre, and a knowledge of subsequent angles of table rotation, enable the required 4x4 (homogeneous) left camera to world, and world to object transformation matrices to be determined. Each edgel of the reconstmction is subsequently tiansformed to the desired rotating reference frame and combined to form the 3D models. The final stage of model creation requires the manual formation of:

(a) Groups (termed cliques) consisting of a distinctive focus feature (such as an arc length or long line) and any number of surrounding features in close proximity.

(b) Grasping location data. Each object-model is given a nominal (X, Y, Z) point together with optional yaw, pitch, roll, guidance angles. This kinematic information in the form of a homogeneous transform is supplied to the robot arm at the time of retrieval.

The clique size and number is essentially determined by the availability of features, moreover, a focus feature in one clique may be used as a surrounding feature of another clique. Cliques are stored and associated with the model, and they provide the features to be matched during the recognition phase. Fiuther details are given in section 6.3. Examples of cliques are given in Fig 7.1.12. Fig 6.1.3 depicts created 3D-models of a cup and an alphabet cube.

The above procedure was coded and added to TINA's Matcher tool source code. Fmther details of this are provided in appendix B. Initially, three objects, an alphabet cube, a mug, and a teapot were produced for test purposes. These models comprised a simplified database of objects to be recognised in a scene. Fig 6.1.4 shows the stereo images of a scene containing these 3 objects and the subsequent scene-model produced from the images.

For most cases gripper orientation angles of 0° or 90° are specified.

Chapter 6 Object Modeling, Recognition and Retrieval 95

fa) (b)

Fig 6.1.3 3D created model of (a) cup and (b) cube

(a) (b)

Fig 6.1. 4 (a) Stereo images of scene (b) 3D scene model

Portable Tables for Model Creation

Two portable computer-controlled tables considerably smaller than the one shown in Fig 6.1.2, were later designed and constructed. One being light and easy to move (Fig 6.1.5 a), the other robust, and fitted with level and height adjustment (Fig 6.5.1 b). These permit the cameras mounted on MROLR's stereo head to be used for the automated model creation. The main benefit of these tables is their portability, and the elimination for the need of a duplicate set of cameras. Subsequent models were created using these. Appendix B provides additional details.

Chapter 6 Object Modeling, Recognition and Retrieval 96

(a) two views ofthe lightweight model creation turntable

(b) Two views ofthe heavy duty model creation tum table with adjustable height level

Fig 6.1. 5 Portable model creation tables

Chapter 6 Object Modeling, Recognition and Retrieval 97

6.2 Model Enhancement - Model Editor

A Model Editor was developed to facilitate the manual removal and addition of 3D segments in formed models. This editor utilises many of the 3D geometric list structures library functions provided with TINA and in particular those associated with TINA's geomstat tool. At the commencement of this work, the geomstat tool was at an incomplete stage having no input/output stream facility. However numerous functions for 3D line manipulation were coded, and a relatively small amount of supplementation made it useable as a model editor. This additional code was added and linked to form a module titled "Model Editor".

Editing of models is desirable on occasion as double lines or conies arise as a result of camera lens distortions and/or calibration inaccuracies, or lines are fragmented or omitted altogether due to occlusions. An example of the editor's application to one ofthe test cubes is shown in Fig 6.2.1. While such enhancements are aesthetically pleasing they provide an additional processing overhead and in general are found to provide little or no improvement in recognition capability. This stems from the fact that scene-models (produced while panning and therefore not available for editing) frequently contain broken or missing edge segments (Fig 6.1.3).

--U|Mo<M_E<IKlir • X Size r j Houga ^ ) W)l J ProJ ^j

Install } clone ) init ) ropoint j

null

a X Display NewTvtool) View j Terrain J J<ilP,''

Macro File: default^ apoenJ j close i rmj

File I/O Mono I stereoj Sequence}

Tools Edge ,i Cornerj Imcalc ,i

Matcher ) stereo Test) ModeLEdltor ) SmartBOl; Callb

nearest font 6x10

Model EdUirToal a X pick: Edit V) Bdd T-J liyoM ^1 Coanetlc --j »etch f )

Pick Ceo, r ) Print r ) Metric r )

J Verbose J Do correct o°rrect)

Directory Name: .^

Base Name: vcUjej, ,

File input i tlreed input} Save Edits J j y list: ketfte

(a) Editor (Edited Model, Conies not displayed)

=p=;

Fig 6.2.1

(b) Resulting Model

Application of model editor to a raw 3D cube model

Chapter 6 Object Modeling, Recognition and Retrieval 98

6.2.1 TINA'S Geometric List Structures

Five 3D geometric list structures used throughout much of this work, in particular for model creation, editing and object recognition are: vectors, points, lines, planes and conies (i.e., curve geometry such as arcs, ellipses, circles etc). TINA's software infrastructure encourages the use of these list structures. Details (extracted from TINA's User Guide) are given in Fig 6.2.2.

typdef struct vec3 { Tsjd ts_id; float el[3]; }Vec3;

typdef struct lines3 { TsJd tsJd; unsigned int type; unsigned int label; struct vec3pl,p2; struct vec3 p; struct vec3 v; •/ float length; struct list *props;

} Line3;

/* Tina structure identifier •/ /* point in 3D coordinates */

/* Tina structure identifier */

/* end points */ /* point on line */ /* direction vector

/* line length */ /* property list used to associate

extra specific information*/

typdef struct plane { TsJd tsJd; unsigned int type; unsigned int label; struct vec3 p; struct vec3 n;

/* Tina structure identifier */

/* point on the plane */ /* a normal to the plane*/

struct list *props; /* property list used to associate extra specific information */

} Plane;

typdef struct conic

( Ts id ts id ; unsigned int type; unsigned int label; intfillerl;

double a, b, c, d, e, f.

/• Tina structure identifier •/

/* SUN platform dependance parameter fix*/

/• algebraic formulae •/ double theta, alpha, beta; struct vec2 center; int fitler2;

double tl,t2; int branch; List •props;

} Conic

typedef struc conic3

Ts-id tsJd;

int type; struc conic *conic;

struct vec3 origin; struct vec3 ex, ey, ez

} Conic3;

/* SUN platform dependence parameter fix*/ /* conic params of pl and p2 */ /* for hyperbola only */ /* property list used to associate extra specific information */

/* Tina structure identifier */

/* describes the conic in the plane */

/* origin point on plane */ /* define x,y,z axes:

ez = normal */

The coordinate frame in which the conic lies is at origin and has axis ex, ey and ez. Note the 2D conic lies in the xy plane.

In conic the arc covered by the curve is between angles tl and (2 in radians. All angles are positive with 0 along the x axis and times ¥'tf2 along the y axis. The arc goes from tl to t2 in strictly ascending order, tl will be less than 2*Pi but t2 may be larger than 2*Pi ifthe arc passes back through the x axis. The implicit algebraic equation that represents its form is given by ax + 2bxy + cy +2dx +2ey + f = 0; Theta is the angle ofthe major axis, alpha and beta are the lengths ofthe sem-major and semi-minor axis respectively.

The conic type can be any one of: ELLIPSE HYPERBOLA DEGENERATE

The field branch is used to differentiate which part ofthe hyperbola the data lies on.

Fig 6.2. 2 TINA 3D geometric list structures

Chapter 6 Object Modeling, Recognition and Retrieval 99

6.3 Recognition Algorithm

The recognition algoritiun, in TINA's Matcher-tool module, written by Pollard et al [PPMF87]. [PPM91] was extended by the addition of code to both automate the recognition process and to include it as an integrated module of MROLR. This was necessary to facilitate the autonomous operation of the system. Recognition is based on pairwise matching of primitive geometrical features associated with a model and the scene (termed cliques). For object-models cliques are manually chosen following their creation (refer section 6.1), and consist of a distinctive focus feature (such as an arc length or long line segment) and a number of surrounding local features in close proximity. Object models in general require sufficient cliques so that adequate features maybe matched in a scene, irrespective of the object's orientation. Focus features provide a basis for determining maximal consistency of neighbouring elements within a clique. Features comprise 3D line segments and or 3D conic segments.

During matching three pairwise relationships are tabled. These are (a) orientation, (b) minimum distance apart, and (c) distance from minimum distance apart. These geometrical relationships are illustrated for a line feature in a model and a scene in Fig 6.3.1. TINA 3D straight-line segment structure formats, in addition to an Id label identifier, are represented by the qua(kuple (h, I2, ei, mi) (see Fig 6.2.2 for code details). These parameters represent the two end vector pomts li and I2, the direction vector between them ei and the centroid of the line, its midpoint mi =(li +12)/2.

Fig 6.3.1 Matching of line segments

(a) Orientation difference between two pairs of potential matches a : This is obtained from the dot product of Ci and 62 ie a = cos "' (e, -Cz).

(b) Minimum distance apart between (extended) lines d:

Chapter 6 Object Modeling, Recognition and Retrieval 100

This is represented by the unit vector (normal to each line) d, where: d = (ei X e2)/|ei x 621 and the scalar distance between the Imes h, where: h = (m2-mi) • d. If the Ime segments are close to parallel the distance used is the perpendicular distance between the Imes: h = |(m2 - mi) - [(m2 - mi) • ei]ei|.

(c) Distance from minimum distance apart s, t, q: For non-parallel lines, the distances along the Imes to the start and finish of each line, from the pomt of minimum distance apart is used. From the above, the vector between the pomts of minimum distance apart is given by: h = hd. Movmg m2 to m'2 by adding -h produces a line (e2, m'2) such that lines (ei, mi) and (62, m'2) are coplanar and meet at the point of minimum distance apart on (ei, mi).

qi = {(62 X ei) • [62 X (m'2 - mi) ]} / |62 X 61 p is the signed distance to that point from mi in the direction 61. Scalar distances from li,i and li^ to that point are given by

si = qi + (mi - li,i) • 61 and tl = qi + (mi -11,2) • 61, respectively.

Similarly the distances to the point of minimum distance apart along line2 are S2 = q2 + (m2 - l2,i) • 62 and t2 = q2 + (m2 -12,2) • ©2

Prospective matches for each pair of elements {(si,ti), (Si+i,ti+i)}from the object model description can be tested for geometrical correspondence with each pair of elements m the scene model.

6.3.1 Stored Table of {(si,ti), (si+ijti+i )}

An advantage ofthe use ofthe pairwise elements {(si,ti), (si+i,ti+i)} is that they can be calculated separately for each pair of lines and stored in look-up tables. To accommodate errors and scale differences, an overlap range is provided. Errors are accounted for as follows: Pairs of lines are allowed errors | r|i ) < Pi and | ri21 < P2 on the location of their centroids (mi and m2) and direction vectors (ei, 62). The latter allowable errors are in terms of angles 0i mid 02. On orientation differences, the interval is :

[max (a - 01 - 62, 0), min(a + 0i + 02, TI)], while for the minimum distance apart between (extended) lines, the interval is

h ± (p, + p2+lqi|tan(pi)

For si, tl, and S2, t2 the permissible range is: si ± (pl + p2 + |q2| (tan 02/sin a)) tl ± (pl + p2+|q2|(tan02/sma)) S2 ± (Pl + P2+|qi|(tan0i/sina)) t2 ± (Pl + P2+|qi|(tan0i/sina))

A similar range of overlapping intervals is provided for pairs of coruc features (i.e., arc lengths and radii etc).

Chapter 6 Obj ect Modeling, Recognition and Retri eval 101

6.3.2 The Matching Process

Object model focus features are selected m sequence, closest matches to the selected focus feature in the scene model are considered as potential matches, and neighbouring features considered m terms of thefr pairwise geometrical relationships. Found cliques are ranked accordmg to the highest number of neighbourhood matches consistent with the object model group size.

Transformations to place each found clique into its corresponding position in the scene are calculated using the method described by Faugeras et al [FHPP84]. Cliques that produce near identical transforms are considered as belonging to the same object. The mean transform that will transport the model mto the scene is retumed, facilitating the calculation of the pose of the object m the scene. The tiansform retumed is the commonly used 4 x 4 homogeneous matrix comprising a 3x3 rotational submatrix and a 3x1 column translational vector.

The current sequence of matching a model with a scene is: 1. Build pairwise geometric tables 2. Match cliques 3. Compute the tiansform ofthe object model to carry it into the scene. 4. Transform the model into the scene (placing and displaying it over the recognised object. This provides visual verification ofthe quality ofthe match).

Associated with the matching process are seven parameters whose values can be adjusted to reflect tiie confidence in the scene data namely: (a) Clique size, (b) position error, (c) rotation, (d) length threshold, (e) maximum rotation, (f) maximum tianslation, (g) length ratio.

6.4 Automation of Recognition Process

To automate the recognition process the "current matching sequence" mentioned above was formulated into several nested iterative loops in which the 7 parameters are systematically modified in a "fiizzified" like, fine-to-course adjustment, to reflect a reduction in data confidence levels. Iteration contmues until either recognition occurs, at which stage the process is exited, or the set number of iteration step levels is exceeded, and recognition is deemed to have failed. If recognition failure occurs, the next captured scene model is entered into the loop. This process proceeds as the pan-tilt-vergence head sweeps through a pan arc of-20 to +20 degrees.) tilt and vergence angles being held fixed. This is repeated at each scheduled observation station (waypoint).

Fig 6.4.1 depicts the recognised cup and alphabet cube transformed into the scene, note that pose and scale have been correctly identified.

Chapter 6 Object Modeling, Recognition and Retrieval 102

Fig 6.4.1 Model Cube and Cup scaled, and placed over objects in the scene

6.5 Model Creation (An Alternative Method)

With the above matcher soflware it is possible to create a 3D model generating facility without the use of a rotating table. All that this requires is to form a sequence of single view models of arbitiary orientations (by manually tuming the object about), and to match each to the first (reference) view. The retumed transform for each match would be used to sequentially integrate tiie matched model with the reference. This method certainly has appeal as it makes the whole scheme more portable (in not needing the extra hardware) and more robust in that the orientation retumed by the tiansform is likely to be quite precise. The drawback of this method is that it requires laborious manual intervention because as the model grows with each sequenced integration of single model views, new focus features and clique groups need to be found and stored, to ensure matched features in the next sequenced view. With the current model creating method, sequenced views are integrated automatically, and clique groups required to be found only for the Mly mtegrated finished model.

6.6 Pose for Object Retrieval

Pose information to carry the object-model into the scene (as outlined in section 6.3.2) is retumed in the form of a transformation matrix. This matrix provides the necessary rotational and translation information to transport the 3D model into the scene. Observing the object-model correctly aligned with the object in the scene is usefiil for visual verification of a correct match. To provide a suitable grasping point for the robot arm's gripper, each object-model has a designated X,Y,Z location and angular Yaw, Pitch, Roll, information stored at liie tune of object-model creation. This positional information is also transformed into the scene along with the model, the numerical value is obtained by multiplication of the grasp pomt vector by the transform, the resulting vector being m^ (see Fig 6.7.1). It is unportant to appreciate that the model points transformed into the scene are all iiutially referenced to the left camera's coordinate frame, which does not remain fixed because of head motion. The head-centre frame Co, having its origin at the intersection ofthe pan and tilt axis does remain fixed wdth the heads movement (

Chapter 6 Object Modeling, Recognition and Retrieval 103

Fig 6.7.1). Using the head-model derived m Chapter 4 (and more specifically equation 4.3.6) all transformed model points are referenced to frame Co, m^ thus ti^nsforms mto m^".

6.6,1 An Alternative Method For Object Grasping.

The method described above relies on precise alignment and scalmg of the transformed object into the scene. This stems from the fact that the grasping mformation originates from the object-model and needs to be accurately transported into the scene. A simpler altemative approach is to obtain an estimate of only the translation necessary for object grasping and subsequently pick up all objects from the top, using a wide opening gripper. Translational information can readily be obtained from the 3D centroid ofthe matched scene-object features (i.e., Imes and conies) i.e:

m, For n matched centie points mi, Centroid = (=1

n Unfortunately, wide opening and closing grippers are not readily obtainable.

6.7 Retrieval of the Object

As our robot arm cannot be moimted onboard the mobile base due to space considerations, it is currently moimted on a caster platform and manually docked with the mobile base to facilitate the testing and development of the algorithms. The construction of a second (transport) robot mobile base is in progress (see Fig 6.7.2) and appendix B for design details), and in the next phase of this work, this will be summonsed by a wireless link to dock (automatically) with the main (scout) mobile base following object recognition.

When docked, the base frame of the robot arm remains at a fixed distance relative to the head-frame Co, consequently since the located object's pose is known in this head-frame, it is a simple matter to obtain the pose ofthe object in the base frame ofthe arm.

Let rpCfl '^Object

lArrn rpArm '^ Object Co ' "^ Object respectively represent the homogeneous transforms ofthe object

with respect to the main head-frame, the main head-frame with respect to the robot arm base, and the object with respect to the robot arm base.

fpCo _ ^Object ~

«^<,

0

0,o

""yo

0

«x„

^zo

0

P^c

r yo

Pzo

1

«.„ 0.a «..„

nya Oy, a^^

0 0 0

Pxa

Pya

«Z. O, fl,„ p.

I

Where the 3 x 3 sub matrix comprising of column vectors n o a represent the orientation of the frame of mterest (i.e., subscripted frame, object) with respect to the reference frame (i.e., superscripted frame, Co) and column vector p represent the translation between the two frames.

Chapter 6 Object Modeling, Recognition and Retrieval 104

The object m the base frame ofthe robot arm is obtainable from:

—, Arm _ r p Arm rp CQ

*• Object ~ '•Co *• Object 6.1. \

lArm If however T^ is not easily measured, it can be obtamed with the aid of a test object from:

rpArm rpAnn / T P C O \-X ^Cn ~ '•Obiecti '^ Object / 6.7.2

To find T^CQ 't is simply a matter of manually guiding the arm to pick up the test object, noting the arm X, Y, Z coordinates to achieve this, and then using equation 6.7.2. In the docked position the robot arm's reference coordinate frame was aligned parallel to the head-frame so that ^CQ "^ Y^,^ y^o ~^ '^ Arm . •^Co ^Arm • The relative orientation of the head-frame with respect

to the robot arm is shown in Fig 6.7.1. The associated transform '^cT was determined to be:

0.0 0.0 1.0 -627.0 1.0 0.0 0.0 -1040.0 0.0 1.0 0.0 711.0 0.0 0.0 0.0 1.0

, distances specified in mm.

The tiansforms were verified by gasping an object with the following position variables, supplied by the arm's contioUer, and comparing the results obtained from transform products.

X 220.0

Y (-570.0)

Z 270.0

Roll *

Pitch if

Yaw «

• as required for grasping

Choosing a local object coordinate frame (i.e., frame attached to object) v/ith the orientation shown in Fig 6.7.1,

'^Arm Object

-1.0 0.0 0.0 220.0 0.0 1.0 0.0 -570.0 0.0 0.0 -1.0 270.0 0.0 0.0 0.0 1.0

and

Chapter 6 Object Modeling, Recognition and Retrieval 105

rpCo *• Object

using

results

rr\Arm

'^ Object

=

0.0 1.0

0.0 0.0 -

-1.0 0.0

0.0 0.0

rp Arm rp Arm '^ Object ~ Co

m

=

"-1.0 0.0

0.0 1.0

0.0 0.0

0.0 0.0

0.0

-1.0 -

0.0

0.0

rpCo *• Object

0.0

0.0

-1.0

0.0

47.0 -441.0 847.0

1.0

220.0 -570.0

270.0 1.0

as expected.

Several test runs were carried out using this "parallel" arm orientation (see Fig 7.2.12), however the arm's reach restricted the retrieval of scene-object to being in very close proxunity.

To obtain greater reach, the robot arm was rotated 45 degrees towards the head. (i.e., T^°^ requiring a 3x3 orientation sub matrix as below, ? values to be determined).

T*-" — ^Arm ~

-.101 0.707 0.0 ?

0.0 0.0 1.0 ?

0.707 0.707 0.0 ?

0.0 0.0 0.0 1.0

lArm Once agam T^^ g , was determined from the arm controller:

rp Armi Object

-.707 0.0 0.707 545.0" 0.707 0.0 0.707 0.0 0.0 1.0 0.0 429.0 0.0 0.0 0.0 1.0

The object with respect to the head was manually measured to give

Chapter 6 Object Modeling, Recognition and Retrieval 106

rpCo _ 'Object

1.0 0.0 0.0 747.0 0.0 1.0 0.0 -246.0 0.0 0.0 1.0 1003.3 0.0 0.0 0.0 1.0

(Note this tune, an object-frame parallel to head frame Co has been arbitarily chosen, resulting m a unit 3x3 submatrix).

Using:

rpCo _ 'Arm ~

rpCo _ Arm

rpCo rp Object rpCo Object 'Arm ~ 'Object

'-.707 0.707 0.0

0.0 0.0 1.0

0.707 0.707 0.0

_ 0.0 0.0 0.0

i'rp Arm \ -i ( 'Object /

460.0'

-675.0

618.0

1.0

Manual measurements confirmed these values to be correct.

This r ^ transform was used in all subsequent scene-object retrieval trials.

Others i.e. Jagerand et al [JFN97] use visual servoing rather than joint contiol to successfiilly manipulate objects. This method requires the establishment of a visual-motor Jacobian, obtained by visually observing changes during object movements and relating these to particular controller commands.

In this work the object grasping transform T^^^^ is obtained directly once the object's pose, relative to the main head-frame, is obtained.

Chapter 6 Object Modeling, Recognition and Retrieval 107

RIaM V>l||<iw

Aim

Fig 6.7.1 Relative position and orientation of head and arm coordinate frames remain fixed

Chapter 6 Object Modeling, Recognition and Retrieval 108

The addition of wireless communication and auto docking capability, a future task

Fig 6.7. 2 Partially completed transport mobile base

Chapter 6 Obj ect Modeling, Recognition and Retrieval 109

6.8 MROLR System Simulator

The development of a "MROLR System Sunulator" was undertaken and is at an early stage of completion. The Simulator will be provided with models representing the mobile base, landmark features, a target-object and a docking robot arm. In its present state, it consists of two non-integrated modules. Each module is based on, and builds upon the work of existing sknulators. The first module utiUses and builds upon the code associated with a sunulator that is part of A. Davison's SCENE package [DavOl]. The second module is based on and builds upon Gurvinder Pal Singh's "Virtual Robotics Lab" [Sin99]. As Gurvinder's software did not mclude a robot compatible to the UMI RTX, this robot was coded and added. In addition the software extended to provide the necessary additional degrees of freedom (an increase from 5 to 6) required by the RTX. Inverse kinematic solutions were also not included with the simulator; these important algorithms were coded and added to facilitate direct movement to specified "x,y,z" positions with "yaw, pitch, roll" gripper orientations. Details of these additions are given in appendix B. Extra push buttons were also added to the "Contiol Pad" to facilitate these new movements (see Fig 6.8.4).

Module 1 graphically models the moving vehicle navigatmg to specified waypouits with the aid of displayed (selectable) landmark features. It utilizes the same EKF and mapping algorithms used by MROLR (with full covariance estimations of vehicle location and landmark features). A random generator is incorporated to simulate wheel slippage, and noise. Other features include the display of true positions in yellow and estimated positions in green. Landmark features can become selected or deselect by mouse clicks. Selected features change to red. Various control options and data outputs are provided with the associated GUI 's that are part of the simulator (see Fig 6.8.2). It is envisaged that this simulator will provide a usefiil facility for evaluating altemative feature selection and mappmg and navigational stiategies.

Module 2 is intended (ultimately) for evaluating proposed grasping points of selected 3D target object-models to ensure they can be retrieved. Even with precise modeling of the robot "end-effector" this is quite a difficult study, and one which the author believes has had insufficient attention. For instance, how does one design a generic set of rules for successfiilly picking up the vast number of assorted shapes and sizes of domestic objects with handles? Standard kinematic modeling incorporating consideration for size, volume, distribution of weight, balance of forces, and surface normals is possibly a start. Such analysis could be incorporated into this simulation unit. An mitial attempt at accessing the suitabiUty of a "grasping site" on an object, based on gripper opening size and the surface normals was added to the module. The requirement for a suitable "grasp" is that the object's dunensions be appropriate, and that the surface normals at the points of contact, and the gripper's surface normals, are in close (opposing) alignment (Fig 6.8.1). For situations where the grasping site was deemed unsuitable the gripper "slid" over the surface in an attempt to locate a suitable area. See Fig 6.8.6 for an example of the gripper attempting to pick up a teapot at an unsuitable location, and the gripping slipping "laterally" towards the teapot handle. This approach was later considered unrealistic as it took no account of the surface roughness, texture, or fiictional forces involved, and was abandoned. Object physical dimensions only are currently considered in assessing grip ability.

Chapter 6 Object Modeling, Recognition and Retrieval 110

Fig 6.8.1 Gripper normals (brown) non aligned with object surface normals (blue).

Both modules, in addition to uses outlined above will make useful tools for "MROLR System" teaching and demonstration. Where used solely for simulating MROLR, the modules have had their caption title modified to display "MROLR".

Module 1 is shovm in Fig 6.8.2. (a) shows the mobile robot navigating to a specified way point with six landmark features, (b) depicts an example with ten known landmark features added. Examples of "Auto Feature Selection" and Covariance Outputs from the Simulator are given in Fig 6.8.3.

Module 2 is displayed in Fig 6.8.4 through Fig 6.8.9, illustrating the "controller" and "arm" retrieving a difficult object to grasp (the small teapot), and moving it to several locations. Fig 6.8.10 displays the arm after having transported an object (the red sphere) some distance from the table. {Both the teapot and sphere model objects were borrowed from the MS DirectDraw SDK).

Chapter 6 Object Modeling, Recognition and Retrieval 111

(a) Scene with 6 unknown landmark features

(b) Scene with 16 features (10 known known features )

Fig 6.S. 2 Several simulated views ofthe mobile base navigating to waypoints

Chapter 6 Object Modeling, Recognition and Retrieval 112

fidded known feature with known feature label Added known feature with known feature label fidded known feature with known feature label fidded known feature with known feature label Added known feature with known feature label Added known feature with known feature label Added known feature with known feature label Added known feature with known feature label Added known feature with known feature label Added known feature with known feature label Read 8 waypoints* »»» SLOW PREDICTION «*« Best score found: 5»16285e-07. fiuto-selected feature with label 2, »*» SLOW UPDATE **» R_tot:

4e-06 0 0 0 0.0001 0 0 0 4e-06

S = C

3.79788e-05 -3»25407e-24 4a7903e-23 -3.25407e-24 0,0001 -4.00216e-42 4.17903e-23 -4.00216e-42 4e-06 ] »»» SLOW PREDICTION »** Best score found: 6,23208e-07, fiuto-selected feature with label 9. *** SLOW UPDATE «** R_tot:

4e-06 0 0 0 0.0001 0 0 0 4e-06

S = C

5.53167e-05 -2.37655e-06 4.57195e-07 -2,37655e-06 0.000100123 -2.36034e-08

1 2 3 4 5 6 7 8 9 10

S = C

6.44373e-06 -2.47343e-07 3.11819e-08 -2.47343e-07 0.00010016 -2.0121e-08 3.11819e-08 -2.0121e-08 4.00254e-06

i_i

**» SLOW PREDICTION **» Best score found: 2.13885e-07. fiuto-selected feature with label 3. *** SLOW UPDATE *** R_tot:

4e-06 0 0 0 0.0001 0 0 0 4e-06

S = C

6.50338e-06 -2.81064e-07 3.54218e-08 -2.81064e-07 0.000100175 -2.20106e-08 3.54218e-08 -2.20106e-08 4.00277e-06 : *** SLOW PREDICTION *»* Best score found: 2.64448e-06. fiuto-selected feature with label 5. »** SLOW UPDATE *** R_tot:

4e-06 0 0 0 0.0001 0 0 0 4e-06

s = : 0.000993588 -8.00296e-07 7.71573e-08 -8.00296e-07 0.000100232 -2.23564e-08 7.71573e-08 -2.23564e-08 4.00216e-06 ]

Fig 6.8.3 Examples of auto-feature selection and covariance outputs from simulator corresponding to Fig 6.8.1(b).

Chapter 6 Object Modeling, Recognition and Retrieval

M n n i R 30 SimuMloi

IConliolPad Q I

X |-508

Y |743.4413B8341

z |0

Waist-

Shouldei-

A/ml-

Aim2-

Wiist Pitch-

Wrist Roll-

Ctose

Waist-f

Shoulder+

Arml +

Arm2-f

Wrist Pitch+

Wrist Roll•^

1 Home i

Stepping angle

Teach Mode On

Run IrvKin

Fig 6.8. 4 Robot arm at table with target object (small teapot)

Notice the X, Y, Z positions ofthe gripper are displayed on the Controller. iOLH 3D Simiil.itiii

'tile ypw Qptions Help

Fig 6.8. 5 Arm with gripper over target object

Chapter 6 Object Modeling, Recognition and Retrieval 114

R P G

Fig 6.8. 6 Gripper closes and slips over surface of object towards handle in search of more appropriate grasping location.

MHOLR 3D Simulator

Fig 6.8. 7 Target being raised

Chapter 6 Object Modeling, Recognition and Retrieval 115

MROLR 3D Simulatoi file View Uptionj Help

Fig 6.8. 8

• MROLH 3D Simulaloi

Arm showing joint movements

Hnc Fik View OpHoriv He-lp

Fig 6.8. 9 Arm showing joint movements

Chapter 6 Object Modeling, Recognition and Retrieval 116

MRULR 3D Simulator

Fife View Qptioni yelp

Fig 6.8.10 Arm moving away from table with retrieved red sphere

A useful possible extension to the simulation modules (discussed further in Chapter 8 ) would be for all created objects to be positioned in the simulated scene, mouse selection of the target object (or verbal selection, if voice recognition was added) could then initiate a simulation run.

6.9 Conclusion

This chapter commenced by citing the advantages of using wire frame outline models created from multi-views, and describes a simple and effective method for their production. For poorly formed models an editor is provided that enables both the addition and removal of 3D segments. Paramount to this work is the ability to reliably recognise objects in a scene, whose models have been created and maintained in a database of objects. A recognition algorithm based on the work of Pollard et al was implemented, and extended to automate the recognition process. The recognition module also provides the essential 3D pose parameters, relative to the stereo head, to guide the docked robot arm towards the sought object's grasping point.

Chapter 6 Object Modeling, Recognition and Retrieval 117

With the conclusion of this and previous chapters the framework to perform tests to ensure a fully functioning MROLR have been established. Testing and verification is carried out in Chapter 7.

Chapter 7 Testing and Verification 118

Chapter 7 Testing and Verification

7.0 Introduction

The algorithmic development and modeling of the previous chapters is now complete and the necessary tools to create and edit 3D object-models that may be requested for future retrieval have been developed. The means for navigating the enviroimient exist. Searching for and mapping landmark features with the aid of active vision at each step, and using the resulting data for self-localisation purposes reduces the likelihood of "loss of bearings" due to wheel slippage or collision. Once in close proximity to a scene containing any ofthe objects for which a created model exists, a search by panning the stereo head can be performed. If the object is successfully recognised, its 3D position in the scene with respect to a docked robot arm can be ascertained. The resulting homogeneous transform formulated, together with the a priori specified handle location or grasping point, should guide the arm to pick up the object and place it on board MROLR. Limited simulations of MROLR can also be performed using the "virtual environments" of module 1 and 2 described in section 6.8.

The vital task of testing MROLR to vindicate the methodology of this research remains. Tests need to include the creation of a significant nimiber of typical domestic or industrial object models as well as navigational, recognition and retrieving trials. These are described in this chapter.

The results ofthe creation often representative domestic object-models are given in section 7.1. The results of object location and retrieval trials are given in section 7.2, the scope and limitations of tests, their evaluation, and methods of overcoming some of those limitations are disctissed in the subsections.

7.1 Creation of Da tabase Models

Ten representative domestic object-models were created using the portable (level adjustable), model creating table described in section 6.1. These were: * large cup,

vice, shaver bottle Solo can cube hole punch stapler drink package cup

Their creation is shown in Figs 7.1.1 to 7.1.11. Note that for conservation of space, only one stereo view of each object is shown from Fig 7.1.3 onwards.

Chapter 7 Testing and Verification 19

Fig 7.1.1 Object model creation

Chapter 7 Testing and Verification 120

(a)

' % " - • •

^ ^

! ' 1

;'; ; • I r^ ' . i

c ,

1

y . '.

"

.

Aifw

(b)

Fig 7.1. 2 (a) 3 Stereo views of "Large Cup", (b) Model of "Large Cup"

Chapter? Testing and Verification 121

/ ' IP •

(a)

••• ' - - 1 ' .^ f

r ' l •

1 11 i ' 'C t 1

• ;

• ; l i .

;

,1

*\_

(b) Fig 7.1. 3 (a) Single stereo view of "Vice", (b) Model of "Vice'

Fig 7.1.4 (a) Single stereo view of "Shaver", (b) Model of "Shaver"

Chapter 7 Testing and Verification 122

Fig 7.1. 5 (a) Single stereo view of "Bottle", (b) Model of "Bottle"

Chapter 7 Testing and Verification 123

Fig 7.1. 6 (a) Single stereo view of "Can", (b) Model of "Can'

Chapter 7 Testing and Verification 124

1 ' : | |fe

jsm^.'

(a)

(b)

Fig 7.1. 7 (a) Single stereo view of "Cube", (b) Model of "Cube"

Chapter 7 Testing and Verification 125

Fig 7.1. 8 (a) Single stereo view of "Hole Punch", (b) Model of "Hole Punch"

Chapter 7 Testing and Verification 126

Fig 7.1. 9 (a) Single stereo view of "Stapler", (b) Model of "Stapler"

Chapter 7 Testing and Verification 127

• I' .

P

I I , '

- i r i

M.

Fig 7.1.10 (a) Single stereo view of "Drink Package" , (b) Model of "Drink Package"

Chapter 7 Testing and Verification 128

(a)

(b)

Fig 7.1.11 (a) Single stereo view of "Cup", (b) Model of "Cup"

7.1.1 Storage of Focus Features and Cliques

Fig 7.1.12 illustrates the stored format of focus features and cliques. The first number (10 in this example of the model cup) is the total number of cliques for the model. Following lines contain (in order) the focus feature identification number (i.e., 560) the number of features in the clique (i.e., 9) followed by the Id number of each feature. The focus feature is also normally listed as the end feature of a clique.

10 560 9 1490 556 568 3086 1122 554 550 564 560 564 8 564 550 1098 1122 1096 568 556 1490 554 12 570 568 3086 1112 1098 1096 556 564 1490 3080 1122 554 554 7 1104 564 1132 560 1488 3076 554 1482 7 1482 1132 556 554 1102 1488 546 564 7 1490 556 1102 3080 560 1482 564 1488 5 1488 1132 554 1122 1112 564 5 1490 1482 556 554 564 554 4 554 1488 564 1482 1482 5 1490 1482 556 554 564

Fig 7.1.12 Table of focus feature and cliques

Chapter 7 Testing and Verification 129

Several of these created models will be targeted in the tests that follow in Section 7.2

7.2 Tests

Trials were arranged to substantiate MROLR's ability to navigate to specified waypoints, and recognise, locate and retrieve (grasp) a variety of target objects in various orientations and locations on a table at each waypoint. Lengthy tests to demonstrate the validity of the navigational algorithms were not deemed necessary, as others (i.e., [Dav98], [DavOl]), have akeady done these using the Oxford GTI mobile platform. While our active stereo head, and 2 wheel drive mobile base are different to those of the Oxford project, resulting in the modeling and reformulation outlined in chapters 4 and 5, the methodology and principal algorithmic constructs used are identical. The navigational, self-localisation and mapping-building tests performed here, can be regarded as reinforcement of those obtained by A. Davison. The major testing carried out in this section concentrates on object recognition, pose determination and retrieval.

It is worth mentioning that locating objects in positions other than an "expected region", such as a waypoint, is beyond the present ability of MROLR. To overcome this limitation and enable objects randomly located in a room to be found would require vastly superior cameras and lenses, perhaps with auto focus, zoom and accurate focal point re-calibration. In addition, a path-planning module capable of mapping open spaces that could negotiated would need to be incorporated.

For each test devised, MROLR commenced its journey in the laboratory from a fixed world coordinate (waypoint 0). Following initialisation, the vehicle navigated autonomously using odometry, and visual navigational features, mappmg the enviroimient as it travelled. On reaching its destination the head commenced panning for the target object and if this was located, the robot arm was docked with the platform and the desired object grasped and retrieved. If the object could not be fotmd (perhaps because the object was not on the table), MROLR retumed empty handed to waypoint 0.

Initialisation data supplied to MROLR consisted of known-features, waypoints to navigate to, and the target object and its associated grasp point (if applicable). As MROLR autonomously navigated towards each waypoint the following behaviour was observed:

(a) head fixation angles for the nearest known-feature were calculated (fi om stored x, y, z positional information), and the head cameras driven to gaze in the direction of the feature. A search for a corresponding "patch match" (within a narrow spatial window whose size was determined by map imcertainty) ensued.

(b) if a fixation could not be attained, an altemative mapped or new feature was sought. The decision as to which feature to choose, was based on two criteria: expected visibility, and the value of the measurement. Once a measurable subset of features in the map was identified, the value of measuring each one is evaluated in terms ofthe uncertainty of its position relative to the

Chapter 7 Testing and Verification 130

vehicle. The choice was then made on the basis of the highest innovation covariance (i.e., "the Vs rule" see section 5. 7).

(c) the map continued to be updated sequentially as the robot moved about in its environment, making measurements of features and updating the state vector and related covariances, according to the rules ofthe Extended Kalman Filter.

Several views of MROLR navigating to the "Table" waypoint are shown in Fig 7.2.1, The subsequent "lock on to" known-feature the " fire extinguisher sign" is shown in Fig 7.2.2. Additional features used for mapping the laboratory environment are also displayed in Fig 7.2.3.

(a) Robot navigating towards specified waypoint

(b) Robot closer to specified waypoint

Chapter 7 Testing and Verification 131

(c) Head pointing at known-feature "Fire Extinguisher"

Fig 7.2.1 (a), (b) and (c) Several views of MROLR navigating towards specified waypoint, "the Table".

Featurelocation(W.r.t Head) x=0.123,y=1.345, z^L732 (metres)

Fig 7.2. 2 Lock on achieved for "Fire Extinguisher" sign feature patch.

Testing and Verification

^ m

ID

Fig 7.2.3

(b)

Two additional mapped features (a) light fitting edge (b) clock face, intersection of hands

In Fig 7.2. 4 (a), the robot is at a waypoint panning for the hole-punch. Fig (b) shows the left image of a scan sequence search for the hole-punch in the scene, while Fig (c) shows the database model of the hole-punch. Fig (d) illustrates the resulting created scene-model corresponding to the scene-view in Fig (b), and Fig (e) the model transformed into the scene following recognition. Pose information relating to the found "hole-punch" is given in Fig 7.2. 5. The process leading to the recognition and grasping of the "bottle" and sequences Fig 7.2. 6 through Fig 7.2.12.

"cup" are shown in

Chapter 7 Testing and Verification 133

4 L «

(a) (b)

(c)

(a) Panning robot at a waypoint,(b) Table scene: Captured left image, (c) Model of hole-punch (d) Scene-model of(b), (e) Hole-punch found and model transformed into the scene.

Fig 7.2. 4 Hole-Punch search

Chapter 7 Testing and Verification 134

0.555833 0.315053 -0.7G3280 -0,351152 0.9277G9 0.12G241 0»753487 0.199965 0.62631G

493,333318 -154,193161 270,673859

Rotation about x axis= 17,706822 degrees Rotation about y axis= -48,893348 degrees Rotation about z axis= -32,282987 degrees

Transformation matrix and related rotations

Fig 7.2. 5 Pose information of "Hole Punch"

r ^msssa^sais^csT^^^i^s^?^

I Fig 7.2. 6 Left image of "Bottle" in scene

Fig 7.2. 7 3D models of "Bottle" and "Cup"

Chapter 7 Testing and Verification 135

(a)

(b)

Fig 7.2. 8 (a) 3D model of "Bottle" in scene (b) Recognised "Bottle" transported into the scene

Chapter 7 Testing and Verification 136

(a)

(b)

Fig 7.2. 9 (a) Left image of "Cup" in scene, (b) 3D model of "Cup" in scene, (c) Recognised "Cup' transported into the scene

Chapter 7 Testing and Verification 137

t eonputea tl ^riStori t wrt camera ^rsme computed tranform is -0,769674 0,220012 -0.599330 -0,294048 0,711089 0,638662 0,566689 0,667793 -0,482613

563,264954 -596,585205 1412,442627 Scene Object Found with Head at X=-47,397186 Y=-411,613678 Z=980,958069

(a)

-1.00 0.00 0.00 353.96 0.00 1.00 0.00 -664.40 0.00 0,00 -1.00 299.39

(b)

(a) Pose of Cup in Scene

(b) T /J

Fig 7.2.10 Cup transforms for transportation into scene and grasping

Fig 7.2.11 (a) Head gazing at scene (b) Docked robot arm

Chapter 7 Testing and Verification 138

(a) (b)

Fig 7.2.12 Robot retrieving located objects

7.2.1 Processing Times

At present the average navigation time for a trip to waypoint 2 (approx. 1.49m), including recognition and object retrieval, is 13.5 minutes.'

The journey consisted of 2 angular base rotations and 9 translations. The linear step size chosen for the mobile base was 200mm. Final linear step size was halved until the remaining distance less than 32mm, i.e., for a linear movement of 1490mm, linear steps consist of 7 x 200nim 1 x 45mm, 2 x 22.5 mm). EKF measurements were performed at each step, these included up to 3 head adjustments to achieve feature lock on. Objects sought were the bottle, cup and cube (the cube can not be grasped with current grippers.

The average time is based on 6 runs, time to manually dock the arm is not included.

Chapter 7 Testing and Verification 139

Average processing times:

(i) feature measurement and localisation during navigation 40 sec (ii) object recognition^ 4 min (iii) object retrieval 35sec

For a viable commercial system, fast processing and operating times are crucial. The work carried out in this research is essentially only at a prototype stage and processing times have not been a major consideration for the following reasons. Emphasis has been on proving algorithms, integrating system modules and establishing a working system with the hardware resources available. The fastest PC used for MROLR has only a 90MHz processor. This however has not been the greatest limiting factor to processing speed. Our BiSight head has an undesirable low frequency mechanical resonance that is often excited following either the head or mobile base movement. A delay of several seconds following either the head movement or base movement is necessary to allow such head oscillations to dissipate and prevent blurred images.

In addition, several images formats are currently being used and converted on the fly to the required format. Images are also transmitted serially fi-om the on board PC to a stationary one.

Each of these overheads can be eliminated with new hardware and software rewritings and improvements. It is worth noting that Davison [Dav98] attained real-time tracking of approximately 5 measurements/second during navigation and map building. MROLR should be able to achieve a similar performance. Given that PC processors now operate in excess of 1.5 GHz, it is not difficult to envisage a possible time reduction of at least an order of magnitude (i.e., from 13.5 minutes to 1.35minutes).

7.2.2 How Reliably can Target-objects be Retrieved?

Successfiil recognition and retrieval outcomes will in general occur if the following rules are adhered to: • objects are be with reach ofthe arm (approx. 500mm) and graspable • spacing between objects is sufficient to allow the gripper to move between them. • spacing between objects is at least as great as the largest linear dimension of tiie object. This

will ensure clique features of one object, do not overlapped with cliques of another during the matching process

• ambient lighting is sufficient and remains relatively constant • the surface on which the objects lie is of a dark non reflective texture that minimises glare

and formation of shadows

2

From time the head is in the correct position (i.e., pan search times not included)

^ Following recognition

Chapter 7 Testing and Verification 140

Following recognition the object's pose is determined via precise geometric computations and well established analytical methods, therefore given perfect data, its location and orientation will be accurately found.

This data consists of: • camera calibration parameters, • stereo head calibration parameters, t 3D object models, 3D scene-models, • a homogeneous (4 x 4) transform ofthe object-model with respect to its position in the scene, • grasp location data (in terms of x, y, z, yaw, pitch, roll), obtained from the 3D object-model, • a homogeneous (4 x 4) transform of the head coordinate frame with respect to the robot arm's

base coordinate frame, • a homogeneous (4 x 4) fransform relating the object's grasping location with respect to the

robot arm's base coordinate frame. This final fransform directs the arm (driven by its controller) to retrieve the object.

It is not feasible to accommodate all possible requested gripper grasping orientations, as the robot arm has a limited reach and range of movements. In addition the propagation of data errors that will inevitably accumulate, may also on some occasions preclude successful object retrieval. Quantification of this imcertainty is of course desirable, the considerable number of variables that may affect outcomes however make this task difficult. Consider for example just 3 variables, lighting conditions, object local surface features, and textures. These affect the quality of the scene-model obtained, and consequently the accuracy of recognition and the resulting fransform.

The dependence on a fransform of high precision to solely guide the robot arm is undoubtedly a fimdamental weakness of the retrieval system. In effect the object retrieval system is an open loop one. An improvement in the form of visual feedback (visual servoing) could be added to make the retrieval process more robust, either by more effective use of the existing stereo head and/or with the supplementation of a miniature camera attached to the gripper (i.e., an eye in the hand).

Further testing to obtain comprehensive quantitative and qualitative performance metrics of MROLR, i.e., bounds on recognition and retrievablity, will be performed once the automated docking fransport base with arm, is an integral working part of the system. It is likely that visual servoing, as described above, will also have been added by then. These enhancements are fiirther discussed in Chapter 8.

7.3 CD with Video Clips of MROLR Performing

Three short video clips of MROLR performing several routines are provided on the accompanying CD.

The first shows the automated model production of the alphabet cube on the lighter of the portable modeling table. The completed model is finally rotated about its axes to show typical

Chapter 7 Testing and Verification 141

duplication of edge lines following integration. The cleaner model, after their removal, is also shown.

The second shows MROLR navigating to the table of objects, waypomt 2, fracking two known-features as it travels. On reaching the table the target-object, in this instance, the alphabet cube is recognised and the model is correctiy fransported into the scene. The mobile base then navigates home (waypoint 0) again fracking the same two knovra features. A black coloured ellipse of measurement uncertainty is displayed in close proximity to the feature being tracked.

The third video clip commences with the mobile robot at the table of objects; it then proceeds to recognise the target object, the bottle in this instance. The robot arm is manually docked and the bottle retrieved. Next the bottle is manually relocated to two additional positions on the table, and the process of recognition and retrieval repeated. Of these two additional attempts, only one retrieval was successful, the other being close (within 1 cm of success).

A fourth video clip showing the "transport mobile robot" under test, carrying the arm has also been provided. This vehicle is not yet able to be remotely communicated with or to dock with the "scout master mobile base".

7.4 Conclusion

The tests carried out in this chapter commenced with the creation of a small database of ten 3D models of commonly found household items. MROLR was then directed to autonomous navigate to selected waypoints in search of objects selected from the database. It successfully performed these navigational tasks, mapping its way while simultaneously tracking several knovm landmark features. On reachmg the waypoint destination (in each case a table laden with objects), the active head panned the scene in search of the object. With the object present and in range of the arm, recognition, pose determination and retrieval occurred. Obstacles avoidance was not attempted during these tests as MROLR has only limited ability to detect obstacles (refer section 5.8.1.2). These results confirm the methodology of this research and are sufficiently encouraging to support continuation ofthe project.

The contributions of this work, and planned future improvements and extensions (several of which are currently in progress), are summarised in Chapter 8.

copter 8 Conclusions and Further Work 142

Chapter 8 Conclusions and Further Work

8.0 Introduction

\n this, the final chapter, the main contributions of this work are summarised and proposed extensions for continuation in the future outiined.

8.1 Main Contributions of This Work

The major contribution made by this research is in the design and implementation of a new type of "service robot", comprising a navigating mobile robotic platform with active vision and object retrieval abilities. Given a specified destination (with reference to a home base coordinate frame), its navigational capability enables it to reliably localise and map-build in an unknown indoor environment, using information from measurements of arbitrary landmark-features and robot-odometry. The resulting map is also available for future navigational purposes such as the retum joumey, or recovering from accumulated positional errors that are a result of estimation errors and wheel slippage.

The specified destination is required to be in close proximity (< 0.5 metres) to a surface, such as a table, on which a desired object is to be located and retrieved. The recognition ability of the MROLR enables it to compare a 3D (a prior developed) model to scene-objects modeled while surveying the location, and if a suitable match is obtained (based on 3D geometric features) verify its pose by aligning the 3D model with the 3D matched scene-model. Using information from pose measurements, a fransform is established locating the sought object with respect to the active head. This fransform is then available to guide the docked robot arm to grasp, retrieve, and to place the object on the platform, for transport back to the base home position.

8.1.1 Related Contributions

Related contributions stem from:

• Development of automated 3D object-model creating facilities. This consisted of the design and construction of two, portable motorized turntables, and related software modules. An object to be modeled is placed on the table and the active stereo head forms 3D view dependent wfre frame models as the table is accurately rotated. These sequences are finally integrated to form a complete (view independent) 3D model suitable for storage in a database. The establishment of a model database of sought objects is a prerequisite for thefr use in the recognition process.

• Development of a model editor to facilitate the manual removal and/or addition of 3D segments in formed models. The editor was built on the incomplete "geomstat" module of TINA. The major contribution to the geomstat software module was the addition of an

Chapters Conclusions and Further Work 143

"edited model" output facility. The need for the addition of 3D lines to models arises on occasion as a result of poor lighting conditions, or edges that are difficult to match. More commonly, line duplication occurs due to slight misalignments during the integration process, and the editor now allows these to be removed.

Preliminary design and construction of a "slave transport" mobile robot base and related software has commenced. This mobile robot will carry the robot arm that ultimately grasps and retiieves tiie target object. Once the "master scout" mobile base has accomplished its task of navigating to and locating the desired object, the "transport" base will be sent a command to dock with the "master". Docking ensures that a known, predetermined distance exists between the gripper and the head coorduiate frame.

Extension and modification of existing recognition algorithms to facilitate automated recognition while searching and scanning of the scene. The added algorithm utilises nested iterative loops in which the seven parameters are systematically modified (in a "fiizzified" like, fine-to-course adjustment) to reflect a reduction hi data "confidence levels".

Extension and modification of existing active vision based navigation software to work with the designed mobile base. This included the vmting of an obstacle detection and avoidance module.

Development of kinematic solutions from object pose uiformation, to enable the robot arm to grasp and retrieve the desired object.

Preliminary implementation of a "3D Virtual Environment" for simulating MROLR. This will be useful for evaluating altemative map-building sfrategies, object grasping locations, as well as for demonsfration and educational purposes.

8.2 Future Directions and Work

As stated at the conclusion of Chapter 1, motivation for this work stems from the desfre to ultimately develop a system capable of rapid and reliable retrieval of objects in both domestic and industrial envfronments. Potential applications for such an improved system range from simple domestic and industrial "robotic aids", to "assistants" for the severely physically disabled or visually impaired.

The work completed to date goes a considerable way to meeting the objectives of this thesis. However, MROLR is a prototype, and to develop it into a useful product requfres improvements in several areas. Presently, self-localisation and map building is quite a slow process. There are several reasons for this. Navigational feature measurements are only performed while the base is stationary, and several image format conversions are required. The stationary base requfrement is in partly dictated by the Extended Kalman Filter algorithm requiring constant distant increments for measurements. Also occasionally severe stereo head motion-vibration occiu-s, producing

Chapter 8 Conclusions and Further Work 144

bltirred images when the head moves concurrently with the base. Elimination of the need for several image formats is a relatively straightforward task, and tiie existing software modules will be modified to accept a single standard image formats in the near future. The stereo head vibrations appear to be a structural problem resulting from natural harmonic motions. The purchase of an improved head may be the simplest solution to overcome this problem. With faster computers, it is not unreasonable to expect measurement updates at better tiian eight per second. Of course these time delays do not arise if the mobile robot utilises the navigational mode that relies solely on odometry. For most mdoor envfronments where floor levels are even and wheel slippage negligible, this simpler mode of operation is acceptable and quite reliable.

Currentiy only one scan and search sequence has been implemented. Effectively the target object must reside on a surface in close proximity to the specified destination. An experiment with multiple scan and search sequences has been done. The mobile robot was guided around the perimeter of a table while searching for an object. This proved successful when the object was eventually located at the far end of the table. The ability to move around the boundary area of benches and other objects will requfre additional navigational algorithms to be incorporated, to determine and negotiate complex path frajectories ~ a challenging future project.

In the more immediate future, it is intended to improve on the current method of object recognition. One simple improvement would be to add colour to the object feature list. This could greatly help discriminate between objects of similar shape (i.e., white cup instead of green cup). More complex model building algorithms will also be considered, perhaps using Generalised Cylinders or Ocfrees.

The use of a separate "slave transport" mobile base has both potential advantages and drawbacks. Placing a robot arm on the master mobile base was not a feasible option with the current equipment, due to size and weight constraints. The advantage of using two vehicles is that a relatively lightweight fast acting, low energy mobile base, equipped with active cameras, can initially be used to navigate about, in a search, recognition and location phase. This is followed by a second phase, during which the slower mobile-crane like vehicle is instructed to automatically dock with the "master" and then guided to pick up the object (a" braui and brawn" collaboration). Substantial progress on the design and construction ofthe "transport" mobile base has been made (details are given in appendix B).

Adding "gesture recognition" capabilities to MROLR has also been considered. This would build on tiie work of Wingate and Stoica relatuig to Apprentice Robots [WS96], [WS97]. A Master could point towards the target-object thus making it easier to request the retrieval of an object. There is little doubt that as the methods of communicating with and dfrecting robots becomes more "natural" for humans, such as with speech and hand gestures, tiie use and appeal of robots will grow. Speech confrol trials on several robots have afready been undertaken by Moyssidis andWuigate[MW02].

As acknowledged in chapter 7, the sole dependence on a fransform of high precision to guide the robot arm is a fundamental weakness of the retrieval system. In effect the object retrieval system is an open loop one. An improvement in the form of visual feedback (visual servoing) to make the retrieval process more effective and robust, was suggested in chapter 7. This could be

Chapter 8 Conclusions and Further Work 145

accomplished by more effective use of the existing stereo head and/or with the addition of a miniature camera attached to the gripper (i.e., an eye in the hand). The concept of using cameras attached to robots is not new. Researchers such as Smith et al [SBP97], Hauck et al [HPRSF99], Wijesoma et al [WWR93] and others have successfully utilised them for end effector guidance. A future software module and micro camera could be added to incorporate dfrect "hand-eye" coordination.

hi thefr current form the simulation modules described in Chapter 6 would be useful for teaching and demonsfration purposes. A possible extension to the simulation modules (briefly mentioned in Chapter 6) would be for all created objects to be positioned in the "virtual" scene (in close proximity to their real world x y z location). Mouse selecting the target object (or verbally selecting it, if voice recognition was added) could then initiate a simulation run. Ifthe simulation run proved successful this could be linked with the physical task of navigation and object retrieval. On retrieval by the "physical" docked arm, the objects would be removed from the "virtual" scene. This would provide a dfrect way of target object selection as well as a visual means of readily establishing the proximity of the desfred object position, thus considerably simplifying the navigation, search and retrieval tasks.

8.3 What Follows?

Three appendices follow this final chapter, provided to expand upon the work outlined in the preceding chapters, as well as to outiine some interesting preliminary work undertaken fri search of suitable algorithms to meet the needs and objectives of this thesis.

Appendix A describes early approaches and methods considered and evaluated, but ultimately not followed.

Appendix B provides specific details of ancillary software and hardware developed ui support of this contuiuing work.

Appendix C continues with the algorithmic development for continuous frackuig of multiple features using active vision commenced in Chapter 5.

A comprehensive Bibliography follows the appendices. This is a collection of references maintained and used by the author.

Appendix A Early Attempts to Obtain Object Modeling Algorithms 146

Appendix A Early Attempts to Obtain Object Modeling Algorithms

A 1.0 Introduction

The work of this thesis commenced in 1995 and at that time, obtaining object boundary outiuies with metiic information suitable for 3D object modeluig and recognition was still relatively in its infancy. The problem still atfracting a great deal of attention from eminent researchers, as is evident from the vast array of publications being produced, for example Z. Zhang and otiiers, [Zha02b], [YZ02], [SZOO], [SNWGH99]. fri the process of obtafrung an effective recognition module for MROLR a number of novel approaches were considered. Although algorithms were developed and tested, and results published and presented, ultimately these were not utilised in the recognition module.

This section reviews the work carried out and contributions made during that research period.

A 1.1 Structure Recovery of Objects Using Multiple Camera Views.

An early approach using "line features fri multiple camera views", Wingate [Win97a], [Win97b], was formulated aroimd the algorithm of Taylor and Kriegman [TK95]. This work differed to that of Taylor and Kriegman in that it did not rely on a priori knowledge ofthe cameras focal length.

An absfract from "3D Structure Recovery of Objects Using Line Features in Multiple Camera Views", [Win97a] is reproduced below:

Abstract The work carried out represented an initial attempt to recovery the 3D structure of rigid objects from multiple views. The concept involved images captured from a roving mobile robot equipped with a single camera whose pan and tilt are controllable. The algorithms formulated use straight-line feature correspondence obtained from a minimum of 3 views and an objective function that minimises the squared difference between projected edge segments and image segments. The structure ofthe object in terms of a scaled 3D-line drawing as well as the position ofthe cameras is retumed and available for use as a scene model. Edge outlines were obtained using a Canny edge detector and straight lines fitted to these using a recursive line fitting algorithm. Results of simulated data are provided using a pin hole camera model and these are compared with results using real images in which line correspondence matching is carried out manually.

A 1.2 Trinocular Stereo

A trinocular stereo vision, rectifying images to enhance epipolar matching was another method investigated. Algorithms were developed and results compared with those of two camera stereo.

Appendix A Eariy Attempts to Obtain Object Modeling Algorithms 147

The work performed was presented in Wingate [Win99]. Details are outlined in Section A 1.2. 1 below.

A 1.2.1 Trinocular Vision (3 Camera Stereo)

The notion of using trinocular stereo vision is appealing because the additional camera provides a second epipolar line in each image and the intersection of these lines correspond to a match. False matches are readily eliminated by verification of correspondences in each image (A 1.2. 1 below). Rectification of the images such that the image planes are coplanar and parallel to the lines joining the focal centres results in the epipolar lines between images 1 and 2 becoming horizontal, and also vertical epipolar lines between images 1 and 3. The advantage of using trinocular stereo over binocular stereo comes at the cost of increases in algorithmic complexity and overhead processing time. Ayache [Aya91] states that the rectification of k images necessitates storing k 3x3 matrices, and forming six products, six additions and two divisions per rectified point. R. Jarvis and others [Jar94] have used quad stereo. An advantage of four cameras is that testing for vertical, horizontal and diagonal matches can be performed simultaneously thereby reducing the likelihood of matching outliners even fiirther.

Fig A 1.2.1 Trinocular stereo: corresponding matches at intersection of epipolar lines

Appendix A Early Attempts to Obtain Object Modeling Algorithms 148

mj*

m i '

ft

<i .~. ' ' 1112

Fig A 1.2. 2 Epipolar lines and corresponding matches following image rectification

Ayache's procedure [Aya91, pp.30-41, (outiined below)] was implemented for trinocular rectification. Initially coding was carried out in Matlab and subsequentiy in 'C. The theoretical basis ofthe approach is outlined in the following sections.

A 1.2. 2 Image Modeling

The standard pin hole camera is used, and for clarity of notation this is redrawn in Fig A 1.2.2.1.

P(x,y,z)

Fig A 1.2.2.1 Standard pin hole camera, optical centre C, image point I focal plane P

Appendix A Eariy Attempts to Obtain Object Modeling Algorithms 149

The fransformation from P to I is modeled by a linear fransformation T in projective coordinates.

Letting 1* =(U, V.S)'be the projective coordinates of I and {x,y,z) be the coordinates of P,

1 = V

rx^

= T y z A 1

Where T is a 3 x 4 matiix, generally called the perspective matrix ofthe camera.

ForS 9t 0 the image coordfriates of I are defined by:

1 = fu^ (u/s^

V/S \^j

A 2

T is generally determined by analysing the image of a calibration grid where the positions fri the image of the intersection points are knovm with precision. Various established calibration procedures exist for determining T (The Tsai routine [Tsa87] was used for this work). The 3x4 matrix T consists of twelve parameters, but these are defined up to a scale factor only. Thus, an additional constraint must be fotmd. The simplest is to assume that T^^ is non-zero and to set it

to 1. Following calibration, the 11 parameters comprise ofthe intrinsic and extrinsic parameters ofthe pinhole model.

For later use the following notation fritroduced by Ayache is defined.

ty is the (/,y') element of T , and t,. is the vector composed ofthe first three elements ofthe

row /of T:

The coordinates ofthe optical cenfres C,. {xcf,yc^,zc,) are obtained by solving the system

0 = T,

Uc^

zc,

vl J

A 4

Appendix A Eariy Attempts to Obtain Object Modeling Algorithms 150

A 1.2.3 Image RectiHcation

Image rectification is performed by a linear transformation in projective coordinates normally resulting in either horizontal or vertical conjugate epipolar lines, points with coordinates (M,,V,.)

are transformed to new coordinates (M.,V,'). By appropriate choice of coordinate frames, a point

(«],vj in image 1 has the line segment V2 = v\ of image 2 as its conjugate epipolar line, FigA 1.2.3.1.

Fig A 1.2.3.1 Rectification of two images.

Initially rectification for binocular stereo will be considered and extended for a trinocular stereo system. To begin it is necessary to define two new perspective matrices M and N which preserve the two optical centers Cj and C j . These facilitate the transformation from coordinates

{u,,v.) to new coordinates (w,,vj in each image /.

The following constraints on the new perspective matrices M and N are imposed:

1. The optical centers of Mand N are C, and Cj respectively (to give a unique match

between image points I,, and l] before and after rectification).

2. The focal plane of M is identified with that of N (to produce parallel epipolar lines in both images).

3. For any point P(not in the optical plane), the image points l\ and I'2 obtained by Mand

N respectively are such that vj = vj.

Appendix A Early Attempts to Obtain Object Modeling Algorithms 151

Let

M =

/ t ^

/«2 m24

^,'"3 '^34 J

N =

f t \ «1 «14

«2 «24

V«3 « 3 4 ;

The application of these consfraints lead to matrices M and N defined by:

\{C,xC^)xC,)' 0 ^

M= (C.ATCJ)' 0

((C.-C^MC.^cC^))' ||c,;cC2|p

A 5

A 6

N =

{(C,xC,)xC,)'

(C,xC,)'

((c.-c^Mc. c^))'

0

0

^ ^ 1 - ^ ^ ^ IJ

A 7

By considering a point in image 1, I,(M,,V,) that is the projection of a point V{x,y,z) wdtii coordinates:

P = C,+>ln A 8

Where n, is a dfrection vector ofthe segment I,C,. The projective coordinates ofthe new image I, of point P are then given by:

I.* =

'U\^

\,S,j

= M ^C, +n^

V 1 y A 9

Now, sfrice Cjis the optical cenfre of M, (C,,l)' =0. As a consequence, the computation of /reduces to:

I.* =

'ul^

v^.y

= M'n A 10

Where M' is the 3x3 matrix obtained by removfrig the last column of M. Since n is computed by an affine fransformation of the coordfriates (M, , V, ), it is sufficient to put

Appendix A Eariy Attempts to Obtain Object Modeling Algorithms 152

Q> =

%C,xC,)xC,y

(C.xC^)'

((c.-c^Mc.xcJ)' a 3x3 matiix, to obtafri:

[t jct t;;rt; t|A^^] A 11

y'x = Qx

^«,^

v l y

Proceeding symmetrically, putting

Q2 =

gives:

i(C,xC,)xC,y {c,xc,y ((c.-c^Mc.xc^))'

[t Xt3' t3'xt^ t. Xt ]

A 12

A 13

^ul^

V 2 J

= Q.

^ M . ^

v l y

A 14

Rectification of images 1 and 2 thus reduces to the application of the two linear fransformations in projective coordinates given by equations A12 and A14.

The elegance of Ayache's derivation for QI and Q2 is that it readily extends to multiple cameras, as is evident from the following.

A 1.2.4 Three Cameras

For the case of three cameras, it is preferable to rectify the images to have horizontal epipolar lines between images 1 and 2, and vertical epipolar Ifries between images 1 and 3. For analysis the same steps as used for two cameras are adopted to compute new perspective matrices M.N and Qfrom the optical centers C,,C2 and C3. This results fri:

Appendix A Eariy Attempts to Obtain Object Modeling Algorithms 153

M =

(C3^C,) '

(C,;cC2)'

( C , A < : 2 + C 2 A : C 3 + C 3 A : C , ) '

0

0

-(c, C2

>l

' ^ 3 ) ,

A 15

N =

(C2XC,)' 0

(C,;cC2)' 0

(C,XC2+C2XC3+C3+C,) ' - (C„C2,C3)

Q =

(C3;cC,)' 0

(C2;cC3)' 0

( C , A : C 2 + C 2 A : C 3 + C 3 A : C , ) ' - ( C , , C 2 C 3 )

A 16

A 17

Where (C,,C2,C3) is the triple product of vectors CpC2 and C3.

hi equations A15 to A. 17, Q is chosen to produce equality between the ordinate

vj of I3 and the abscissa i/j of ^2' i-^-. ^3=1/2.

Rectification is performed in exactly the same way as previously but now with three 3 x 3 rectification matrices R,, R2 and R3.

R =

(C,-,^C,)'

(c,. c,,,)' ( C , A ; C 2 + C 2 X C 3 + C 3 X C , ) '

[tixtl t^xti tixti]

A 18

Note for R. the convention that i +1 = 1 if J = 3 and {-1 = 3 if i = 1 applies.

Thus for three homologous points I p l z , and 13,

V2

M3

vj

= Vl

= u[

= "2

A 19

A 1.2.5 3D Reconstruction

Appendix A Eariy Attempts to Obtain Object Modeling Algorithms 154

In principle a knowledge of Tjand Tj is sufficient to compute the three coordinates of any point P, given its two images Ij and I j . However Ayache points out that in the absence of an objective criterion, solving the whole system either by least squares or by Kalman filtering is desirable. Either approach extends naturally to trinocular stereo vision, and more generally to reconstruction based on an arbitrary number of cameras.

For this work least squares was used, and the solution for n cameras follows.

For n cameras, set

With a = {x,y,z)'and

A = U

K^tJ

For which

Aa = b

/ A \

and b = *i

v*«y

A 20

A 21

l(n-v,tO' and b, =

^«t ' - t ' ^ "(••34 "-M

V t' - t '

A 22

The least squares solution is then given by

a = (A'A)"'A'b

provided A'A is invertible.

A 23

Fig A 1.2.5.1 4 Degree of freedom stereo head with third camera mounted

An example of results obtained using three Pulnix colour cameras is given below.

Appendix A Early Attempts to Obtain Object Modeling Algorithms 155

Fig A 1.2.5. 2 Left, centre and right camera images of a scene

Fig A 1.2.5. 3 Resulting 3D trinocular reconstruction

Appendix A Eariy Attempts to Obtain Object Modeling Algorithms 156

A 1.3 Cluster Prototype Centring by Membership (CPCM)

A fuzzy probabiUstic method for object boundary detection resulting in the development of a new progressive fuzzy clustering algorithm was also tested. The algorithm was given the acronym CPCM (Cluster Prototype Centering by Membership). Full details and results were published fri [IWQ95], [ILW96].

The extent of tiiis work is best summarized by the following exfract from P.T.frn [Im97]

"This work applies an enhanced progressive clustering approach, involving clustering algorithms and fuzzy neural networks, to solve some practical problems of pattern recognition. A new fuzzy clustering framework referred to as Cluster Prototype Centring by Membership (CPCM) has been developed. A Possiblistic Fuzzy c-Means algorithm (PFCM), which is also new, has been formulated to investigate properties of fuzzy clustering. PFCM extends the useability ofthe Fuzzy c-Means (FCM) algorithm by generalisation ofthe membership function. CPCM provides a flexible framework to integrate clustering methods that detect cluster substructures. Application development is focused on three problem contexts: (i) detection of contaminants in wool and paint defect on tile surface (region segmentation), (ii) identification of real object lines and circles (boundary detection) and (iii) recognition of a notched feature on an armature housing (general pattern recognition). Results obtained from these algorithms indicate robust clustering and accurate identification of cluster parameters (circle centre, radius, line gradient and corners) from real data silhouettes characterised by the presence of noise, fragmentation and partial obscurity."

The bulk of this work was carried out by P.T. Im with the aid and supervision of M.Wingate. The following five papers on CPCM and applications were co-authored by M.Wfrigate: [ILW96],[IQWH95],[IQWH95b],[IWQ95].

Appendix B Hardware and Software 157

Appendix B Hardware and Software

B 1.0 Introduction

To clarify and extend information provided so far in various chapters, specific details relating to ancillary hardware and software developed is provided in this appendix.

MROLR's system software comprises of numerous fritegrated platform modules (Linux, Windows, DOS) and many megabites of 'C/C++' based source code. Much of it is hardware specific and/or proprietary produced (i.e., written for the mobile-base confroller, active stereo head-controller, robot arm-controller, etc.) and is of little use for other systems, or is not licensed for inclusion in this appendix. Ultimately, it is expected that source code vmtten specifically for this project will be available on a CD for interested parties.

Accurate 3D object-model creation is essential for reliable object recognition. Several portable computer-confrolled rotating tables needed to be designed and constructed to facilitate the object-model creations discussed in chapters 6 and 7. The motion confroller requfrements for the (partially complete proposed) mobile fransport base, are similar to those of the rotating tables, thus for standardisation and ease of manufacture, identical designs were used. Section B 2.0 gives the circuit schematics of the confroller and details of the electronic boards constructed. Specific details ofthe procedure for 3D object-model creation and its implementation in software are outiined in section B 2.1.

Kinematic analysis and solutions for a large range of robotic arms are well documented and understood. Driver binaries for the RTX UMI robot were provided by the manufacturer, however to incorporate kinematic and inverse kinematic solutions for the MROLR Simulation Module, required the development and coding of the relevant mathematical solutions. Forward kinematic and inverse kinematic solutions coded for the RTX robot arm, are detailed fri sections B 3.

S 2.0 Rotating Table and Mobile Transport Robot Controller Circuit Design

Motion confrollers designed specifically for MROLR were standardised aroimd the Motorola HCll microprocessor and National Semiconductors LM629 motion-confrol IC. Thus, confrollers for the model creating tum tables, are the same as used for the mobile transport base, except for the number of motor drivers utilised. All motors are fitted with precision optical encoders permitting velocity and acceleration against time profiles to be specified. This confrol is accomplished using the LM629, 32 bfr motion confroller, see Ust of features below.

Appendix B Hardware and Software 158

LM629 feattires

32-bit position, velocity, and acceleration registers Programmable digital PID filter with 16-bit coefficients Programmable derivative sampling interval 8-bit sign-magnitude PWM output data Intemal frapezoidal velocity profile generator Velocity, target position, and filter parameters may be changed during motion Position and velocity modes of operation Real-tfrne programmable host interrupts 8-bit parallel asynchronous host interface Quadrature incremental encoder interface with index pulse input

Cfrcuit hardware and schematic details are shown in Figs B.2.0.3 to B.2.0.9. Interface to PCs is via an RS232 serial port.

Two portable, model creating tables were built, one robust with level adjusting legs to ensure a precise horizontal planar surface, the other lighter and more portable for ease of fransport to different locations. Prior to the integration of the single view models formed, a rotation of each model about the Z-axis of the table, equivalent to the minus the angle the table was rotated, is required. The closer to horizontal the table surface is tiie better the model produced. Both motorised tables, previously shown in Fig 6.1.5 (a) and (b), are redisplayed reduced in Fig B.2.0.1

The fransport mobile base (Fig 6.7.2) was designed and constructed to carry the robot arm and to ultimately dock with the scout master mobile base, as outlined in earlier chapters. In its present "carnation" it is accurately able to move to specified coordinates. This is demonstrated in the accompanying CD which shows the transport base carrying the robot arm. For convenience Fig 6.7.2 is also redisplayed reduced in Fig B.2.0.2

Wfreless communication, and laser controlled docking still need to be added to make this facility a usefiil component of MROLR.

Extracted from National Semiconductors application notes

Appendix B Hardware and Software 159

1 1 l l

• J '•

• i

^-^.•, *

" i i if^agi.

Fig B.2.0.1 Portable motorised turn tables

Fig B.2.0. 2 Mobile Transport base and Arm

Appendix B Hardware and Software 160

(a) Double layer, microprocessor and precision motor controller circuit board. This board comprises 5 basic circuit sections, schematic circuit details are given

in Figs B.2.0.4 to B.2.0.8.

(b) Motor Driver Board : Schematic circuit details are given in Fig B.2.0.9

Fig B.2.0.3 Controller hardware

Appendix B Hardware and Software 161

Fig B.2.0. 4 MC68HC11 Microprocessor and interface circuit

Appendix B Hardware and Software 162

\

s s s s s s s

\ \ \ \ \ \ X

ADl Al)2 \n.( AlU ,\I>5 ,\I*i \117 \H

.\'' A10 All \\1 W • \I4 A15

31 32 33 34

'i 36 37 39 40 Jl J2 43 44 45 46

< R]-.11-T >

ra>)l?ML\SI<}NAf.S

2-llT)2iPtnl-lll1

"TL

AWOO ADIOl ADI02 ADI03 ADI04 ADIOi ADI06 AD107 AD108 ADI09 ADIOIO ADIOl 1 ADI012 ADI013 AD10L4 ADI0I3

CNTLD CNTLI CNTIi

—j < IRU I -

< P O R T D ^

r

^ dZ

Va* V22UK V i a w

•^ic"

v r ^a: vet

\l ry AND MJarsoNAl-S

MCOSHCl 1 Flash PSD

vssr

|[)i.wnB.-

Fig B.2.0. 5 Flash PSD memory circuit

Appendix B Hardware and Software 163

Fig B.2.0.6 Motion controller circuit, utilises LM629 motion controllers

Appendix B Hardware and Software 164

^

RUN R2IN TUN T2IN

RIOUT R : o u r TIOUT T:OUT

? V1C6SHC1 1 RS232 & Communcalion

SIM Nonfcet

r^ , . vM..:ym Rk rcr.^,^ S!T4AyTiHii-nnnH | f > » u ^ ^ ^ am^B,

Fig B.2.0. 7 RS232 Serial communication interface circuit

Appendix B Hardware and Software 165

Mrjflicii i '«l • <P3tTA>- L

X ±

•<)Rir-(A DI'ORTi

Q M +• T

i . i, i„ i,. i,„ i i„ i,. i, i„ i T T" ?•• T" T" T'- T T"- T" T" T'

MC68IK-I I Power & Port a« ' I UmUr

» .1^^^<iu^,u„^^ l i g l ^

Fig B.2.0. 8 Power and ports circiut

Appendix B Hardware and Software 166

•oVMIiSl'JM'l ^•3(MN\'

&

fc^' 3

{4-' tC^r=lA

raR w ouri fiRAKE OUT3 PWM "niERMAI, H j \ 0 BCX7TSn«API r SENffi BOOTsrBAP2 5

DIP « OUTI EBAKE O U T : PWM THERMAL FLAO BOCTSFHAPI IffiNffl BOCTSTRAre °

±

a

k

MCdSHC l I MOTOR DRIVFR

Paic. y>NiT.-3irc file crfrtrdicr 3fH.M(.isaii-:ii)nn | i>,«mRv"T?

Fig B.2.0- 9 Motor driver circuit

Appendix B Hardware and Software 167

B2.1 Model Creating Procedure Details

The procedure for creating a model is as follows:

3D edge models of each rotated view are automatically provided via the stereo head, computerised table, and modified TINA software. These are Model, i.poly, with i incrementing from 1 to 7, respectively representing edge model views of: 0, 45, 90, 135, 180, 225, 270, 315 degree rotation, about the Z axis ofthe table (i.e., world coordinate frame). Each view comprise 3D line and 3D conic edge segment location descriptors referenced to the Left Camera coordmate frame. Left Camera homogeneous tiansforms TCWL^ and TWCL are established to facilitate fransformations of "3D segments" to either world coordmate frame or tiie Left camera coordinate frame respectively. (Note notation CW signifies camera measurements transformed to world coordinate frame, while WC signifies world coordinate measurements transformed to camera coordinate frame). TCWL comprises TCWL.r (3x3 matrix of the camera rotation parameters with respect to world coordinate frame) and TCWL.t (3*1 matrix of the camera translation parameters with respect to world coordinate frame). The inverse camera transformation structure TWCL comprises TWCL.r and TWCL.t (rotations and translations with reference to the Left Camera coordinate frame). Rotation and translation matrix data are obtained from a prior Tsai^ camera calibration results.

A 3x3 matrix transform T45Z for producing 45 degree (clockwise) rotations about the world coordinated frame Z-axis was also established.

B 2.1.1 Pseudocode: Make_modeI_procO

/* Convert 3D segments in poly buffers from left camera to world coordinate frame) */ For i = 1 to 7 apply TCWL to Model.i.poly

/* Rotate poly segments (clockwise) about the world Z-axis an amount consistent with the respective table (anti-clockwise) rotation (each table rotation increment = 45 degrees). Append rotated segments to Model.i.poly */

For i = 2 to 7 apply (i*T45Z) to Model.i.poly and append resuhs to Model.i.poly

/* Fmally convert 3D segments in .poly models back to Left Camera coordinate frame as TINA TV displays camera coordinate views) */

Apply TWCL to Model. 1 .poly

Display resulting Model. 1 .poly

Note for most cases only 4 models (instead of 7) are necessary. For tiiese cases i=l to 4, and a T90Z = 2xT45Zisusedetc.

' TCWL is the 4x4 homogeneous transform ofthe Left Camera w.r.t. the Worid Coordinate Frame. ' The Tsai calibration routine is provided in TINA's Calibration tool.

Appendix B Hardware and Software 168

B 3.0 Forward Kinematics and Inverse Kinematics of the UMI RTX

6 DOF Robot.

<4>e, e<4^0-as

.=0 --:l^,

,' \ L,=0 ---^.^4 a 4

as

Fig B 3.0.1 DH link coordinate systems and joint assignment for the RTX robot arm

B 3.1 Denavit-Hartenberg Representation (D-H)

coordinate frames: 1. z\.\ axis lies along the axis of motion ofthe i* joint. 2. Xj axis is normal to the z\.\ axis, and points away fi-om it. 3. yi axis completes the right-hand coordinate system as required.

\xxk parameters: 1. 0; is the joint angle from the Xi.i axis to the Xj axis about the z\.\ axis (using the right hand rule) 2. di is the distance form the origin ofthe (i-1)* frame to the intersection ofthe Zi-i axis with the x; axis

along the Zi.i axis 3. ai is the shortest distance between the Zi.i and the Zi axes (offset distance from the intersection ofthe Zi.i

axis with Xi axis to the origin ofthe i* frame along the Xi axis) 4. tti is the offset angle from the Zj-i axis to the Zi axis about the Xi axis (using the right hand rule)

Appendix B Hardware and Software 169

'-•Ai = T(z,d)R(z,^)T(x,a)R(x,V) =

cos^.

sm^,

0

0

-cosoTiSin^j

cos a, cos^,.

sma,.

0

sin a, sin ,

-sina^cos^,

cos or,

0

a, cos^j

a I sin^,

d.

1

Links

I 2 5 4 5 6

di

var 0 0 0 0 0 177mm

ai-i

0 254nim 254mm 0 0 0

ei 0 var var var var var

OCi.,

0 0 90° 90° 0 0

Note var = variable, ** signify a prismatic joint while * a rotary jomt.

biitial variable joint values are: di=922mm, 62 = 63 = 64 = 0°, 65 = 90°, 06=0

Fig B 3.1.2 Denevit Hartenberg link/joint parameter table

The A Matricies

A[l] =

A[2] =

A[3] =

A[4] =

Cos[ql] -Sin[ql] 0 0 Sin[ql] Cos[ql] 0 0 0 0 1 dl 0 0 0 1

Cos[q5] Sin[q5]

'note q is used in place of 9

I

Cos[q2] -Sin[q2] 0 254 Cos[q2] Sin[q2] Cos[q2] 0 254 Sin[q2] 0 0 1 0 0 0 0 1

Cos[q3] -Sin[q3] 0 254 Cos[q3] Sin[q3] Cos[q3] 0 254 Sin[q3] 0 0 1 0 0 0 0 1

Cos[q4] 0 Sin[q4] Sin[q4] 0 -Cos[q4; 0 1 0 0 0 0

0 0

Sin[q5] -Cos[q5]

0 0 0 1

0 0

Appendix B Hardware and Software 170

A[5] =

A[6] =

0 0

Cos[q6] Sin[q6] 0 0

1 0 0 0

-Sin[q6] Cos[q6] 0 0

0 0 1 0

0 1

0 0 177 1

B 3.2 Forward Kinematic Solution " T

The Forward Kinematic Solution iT is obtained from:

T=^A* A* A*'A*^ A*^A =

« .

« .

«z

o.

Oy

o.

«x

a,

«z

P.

Py

Pz

0 0 0 1

Because of its size gJ'=T06 is printed one Row at a time, note also

Sl=Sin(qi), Cl=Cos(qi), etc.

O/Tl ,

6-' •

T06[l,l] = -(SI (S2 {-(S3 (C5 C6 S4 - C4 S6)) + C3 (C4 C5 C6 + S4 S6)) +

> C2 (C3 (C5 C6 34 - C4 S6) + S3 (C4 C5 C6 + S4 S6)))) +

> Cl (C2 (-(S3 (C5 C6 S4 - C4 S6)) + C3 (C4 C5 C6 + S4 S6)) -

> S2 (C3 (C5 C6 S4 - C4 S6) + S3 (C4 C5 C6 + S4 S6)))

T06[2,l] = Cl (S2 (-(S3 (C5 C6 S4 - C4 S6)) + C3 (C4 C5 C6 + S4 S6)) +

> C2 (C3 (C5 C6 S4 - C4 S6) + S3 (C4 C5 C6 + S4 S6))) +

> SI (C2 ( - ( S 3 (C5 C6 S4 - C4 S 6 ) ) + C3 {C4 C5 C6 + S4 S 6 ) ) -

> S2 (C3 (C5 C6 S4 - C4 S6) + S3 (C4 C5 C6 + S4 S 6 ) ) )

T 0 6 [ 3 , l ] = C6 S5 T 0 6 [ 4 , l ] = 0

T06[l,2] = Cl (-(S2 (S3 (C6 S4 - C4 C5 S6) + C3 (-(C4 C6) - C5 S4 S6))) +

> C2 (C3 (C6 S4 - C4 C5 S6) - S3 (-(C4 C6) - C5 S4 S6))) -

> SI (C2 (S3 (C6 S4 - C4 C5 S6) + C3 (-(C4 C6) - C5 S4 S6)) +

> S2 (C3 (C6 S4 - C4 C5 S6) - S3 (-(C4 C6) - C5 S4 S6)))

T06[2,2] = SI (-(S2 (S3 (C6 S4 - C4 C5 S6) + C3 (-(C4 C6) - C5 S4 S6))) +

Appendix B Hardware and Software

> C2 (C3 (C6 S4 - 04 C5 S6) - S3 ( - (C4 C6) - C5 S4 S 6 ) ) ) +

> Cl (C2 (S3 (C6 S4 - C4 C5 S6) + C3 ( - (C4 C6) - C5 S4 S 6 ) ) +

> S2 (C3 (C6 S4 - C4 C5 S6) - S3 ( - (C4 C6) - C5 S4 S 6 ) ) )

T06[3 ,2] = - ( S 5 S6)

T06[4 ,2] = 0

T 0 6 [ l , 3 ] = C o s [ q l + q2 + q3 + q4] S5

T06[2 ,3] = S i n [ q l + q2 + q3 + q4] S5

T06[3 ,3] = -C5

T06[4 ,3] = 0

T 0 6 [ l , 4 ] = - ( S I (254 S2 + S2 (254 C3 + 177 C3 C4 S5 - 177 S3 S4 S5) +

> C2 (254 S3 + 177 S34 S 5 ) ) ) +

> Cl (254 C2 + C2 (254 C3 + 177 C3 C4 S5 - 177 S3 S4 S5) -

> S2 (254 S3 + 177 S34 S5))

T06[2,4] = Cl (254 S2 + S2 (254 C3 + 177 C3 C4 S5 - 177 S3 S4 S5) +

> C2 (254 S3 + 177 S34 S5)) +

> SI (254 C2 + C2 (254 C3 + 177 C3 C4 S5 - 177 S3 S4 S5) -

> S2 (254 S3 + 177 S34 S5))

T06[3,4] = dl - 177 C5

T06[4,4] = 1

171

B3.3 Inverse Kinematics

The above transforms may be used to obtain the inverse kinematic solution for the robot arm. Altematively the inverse kinematic solution may be obtained by considering the joint movement geometry directly. This later geometrical approach was taken and the resulting fimction (coded m C) is given in Section 3.3.1 below. A more general solution (suitable for arms with 5 or more degrees of freedom) was also adopted. This extends the use ofthe simulator to include arms with a variety of configurations and joints (i.e.. Articulated, Spherical, Cylindrical joints). This second solution uses the pseudo-inverse ofthe manipulator's Jacobian and is completely general. Recall the forward kinematic solution is:

lT=\A*lA*]A*lA*',A*lA.."-l A = K(q),

Appendix B Hardware and Software 172

The inverse kinematic solution is (q) =K ^T) where (q) is a vector array ofthe variable joint parameters.

Because the solution is obtained iteratively, it is less efficient than specific inverse kinematic solutions derived symbolically or from direct joint geometry. A fiirther drawback is that while this approach allows a solution to be obtained at a singularity, the jomt angles within the null space are arbifrarily assigned.

The code for the pseudo-inverse was ported from Peter Corke's Matlab "Robotics Toolbox" to 'C code.

publicly available from www.brb.dmt.csiro.au/dmt/programs/autom/pic/matlab.html

Appendix B Hardware and Software 173

B 3.3.1 'C written function to compute inverse kinematic solution for RTX robot arm

(Note portions of this code were ported from pascal code generously provided by Geoff West of Curtam University, W.A. Ausfralia.)

int compute_xyz(double *x_coord, double *y_coord,double *z_coord,double *pitch,double *roll,double *yaw,\\ int *s_ec, int *e_ec,int *z_ec,int *wl_ec,int *w2_ec,int *wy_ec) { double x_wrist,y_wnst,z_wrist; double theta,phi,radius; double opp; int flag; double rxy;

/* compute roll, pitch and yaw motor counts */ /* note - pitch - rotate about -Z (was x)

roll - rotate about X (was y) yaw - rotate about Y (was z)

w.r.t. wrist.

wlec - wrist motorl encoder counts w2_ec -wrist motor2 encoder counts sec — shoulder motor " " eec - elbow motor " " etc.

• /

•wl_ec= (int)(-*pitch/0.07415); *w2_ec=*wl_ec;

*wl_ec=*wl_ec+ (int)(*roll/(0.07415)); *w2_ec=*w2_ec- (int)(*roll/(0.07415)); /* compute position ofthe wrist from the position ofthe gripper, this depends on the pitch,roll and yaw angles */ r_xy=177 * cos(rads(*pitch)); x_wrist=*x_coord - (r_xy * sin(rads(*yaw))); y_wrist=*y_coord - (r_xy * cos(rads(*yaw))); z_wrist=*z_coord + (177 * sin(rads(*pitch)));

/* note default configuration is LH • /

flag=0; *z_ec=(int)((z_wrist-z_orig)/0.2667); if ((*z_ec > 0) II (*z_ec < -3554))

flag=l; radius=sqrt((x_wrist*x_wrist)+(y_wrist*y_wrist)); if(radius>507)

print_to_screen("ERROR attempt to place wrist outside the limits! radius=%f\n",radius);

Appendix B Hardware and Software 174

print_to_screen("will jump out of program"); }

else { opp=sqrt((arm_l*arm_l)-((radius/2)*(radius/2))); ifl(radius< 0.0001)

theta=90; else

theta=atan(opp*2/radius)*360.0/(2*pi);

*s_ec= (int)(theta/0.03422); *e_ec= (int)((-theta/0.06844)*2); if ((y_wrist < 0.0001) && (x_wrist > 0))

phi=90; else if ((y_wrist < 0.0001) && (x_wrist <= 0))

phi=-90; else

phi=atan(x_wrist/y_wrist)*360/(pi *2);

print_to_screen( "phi= %f \n",phi); •s_ec=*s_ec- (int)(phi/0.03422);

/* compute required yaw angle w.r.t position determined by the angle ofthe arm with the y axis */ *wy_ec=- (int)((*yaw - phi)/0.10267);

/* if Umits exceeded, try RH config */ if((*s_ec < -2630) || (*s_ec > 2630) || (*e_ec < -2630) || (*e_ec > 2206))

{ *s_ec=- (int)(theta/0.0342); •e_ec=- (int)((-theta/0.06844)*2); *s_ec=*s_ec- (int)(phi/0.03422); };

ifi[(*s_ec < -2630) || (*s_ec > 2630) || (*e_ec < -2630) || (*e_ec > 2206)) flag=l; i f ( f l a g = l )

{ print_to_screen("WARNING- arm will exceed limits if you continue \n"); print_to_screen("z_ec=%d, s_ec=%d ,e_ec=%d \n",*z_ec,*s_ec, *e_ec); print_to_screen("hit <cr> to continue\n");

}

}

retum 0; }

Appendix C Development of a Feature Tracking Strategy 175

Appendix C Development of a Feature Tracking Strategy

C 1.0 Introduction

This appendix builds on the feature fracking analysis commenced in Chapter 5, developmg the methodology and sfrategies for continuous featuring trackmg while at the same time maintaining a manageable sized feature-map. The objective is to maximise the capabilities ofthe TRC stereo head to make measurements, and provide navigational information.

Davison [Dav98] develops a means of maintaimng a large map of landmark features witiiout the computational burden of having to update the fiill covariance matrices following a measurement. Moreover the solution permits continuous trackmg without compromising the integrity of the filter, and during motion, requires only those parts of the filter that are needed for the current tracking task to be updated.

Davison's derivations are adopted, making the necessary modifications to accommodate the differences in MROLR's control architecture and stereo head outlined in Section 4,2. Building on the expressions (equations 5.1 through 5.52) developed in Chapter 5, the methodology that necessitates only those parts of the state vector and covariance matrix which are directly involved in the tracking process at that time to be updated is outlined. These are the estimated states of the sensor and observed i* feature, x,, and y, respectively, and the covariances

Pxx'^xy/' and I'yy • • Recall that "xy is the covariance matrix between the estimated mobile

robot state x , and feature y,.. By storing a small amount of information at each transition step, the state and covariances can be updated in a generic way at the completion of each tracking motion.

To emphasis the equations developed in Chapter 5 are bemg extended, the same numbering sequence is used.

For MROLR tiie system state transition fimction f (Equation 5.53) comprises of fimction fy which models the head sensor movements in terms of the mobile-base state Xy and the control vector u, and yi the scene-feature states.

The system analysis can be simplified by imposing the following restrictions:

• during the platform's motion the position of the scene-features (relative to the world coordinate frame) do not change, and consequently yi remains constant during a state transition, and

• a scene-feature's measurement depends only the current states ofthe sensor and the feature undergoing measurement (equation 5.54).

Appendix C Development of a Feature Tracking Strategy 176

With the above restrictions the state fransition has the form

'^fv(Xv.U)^

U^) = y 2

5.53

and a measurement has the form

m(x) = mi(xy,yi) 5.54

C 5.5.9 System State Prediction

The prediction of the state following a motion and the correspondmg covariance is readily written down from consideration of equations 5.32 and 5.33. In the following the "k" time-step notation will be omitted.

r ^

f/xv,u;

new y. y2

5.55

new

V /Xy ^ i[y2

* ylx ' ' ( ^v A y

p.2.vrfj;

- y l y l

-ylyX

•yXyl

•ylyl 5.56

C 5.5.10 Filter Update

Following measurement Zi of scene-feature i and recalling tiiat measurement vector mi is a fimction of Xy and yi only, the following equations may be derived:

= (vrm,A, 0 -

vrm,;. ' ^ y / •J

0 vrm,;^, •••)

5.57

Appendix C Development of a Feature Tracking Strategy 177

PVrmj[ =

XX ' xyX xy2

yXx ' - y X y X ^ y X y 2

* p p y2x ' y2yl '^y2y2

T \ V(mJ 0

0

^(^m-A'

or

mm,)i = yi"

- y 2 X

fy /

vrmj[„ + yix/

P y2y/

Wm,;, . 5.58

And the innovation covariance S is:

S=Vrm,APVrm,;f-HR

=vrm,A^p,,vrm,A'; +vrm,4P,,,vrm,;; +vrm,A^p, ,.vrm,;;; +vrm,^p„,.vrm,;;:+R

5.59

Recall that in equation 5.59 R is the covariance matrix ofthe measurement noise.

From equation 5.27 the Kalman gain W is:

rp ^ XX

W = ¥V(m)^S-' = yix

? vrm,;[s-' +

p ^ xy/

? y2y/

r cf-i Wm,j;s

5.60

The WSW^ required to update P following a measurement is:

Appendix C Development of a Feature Tracking Strategy 178

wsw'=(Tvrm;[s •')S(TVfm;[s-')^ =pvfm;/s -'ss-' 'vrm;,p'

=pvrm;[s-'^vfm;p^

Substitutmg for ¥W(m)l from equation 5.58 gives equation 5.62.

5.61

In equation 5.62 recall S is symmetric S = S -1 c-T

Further details on updating components ofthe Estimated State Vector and Covariance Matrix are given below. The derivations assume that the feature observed in the measurement stage has label i, and there are also two unobserved features j and k.

WSW^ =

• XX

> yjx

•y2' '

Vrm,.A^ S-'Vrm,A^ (P,, P,, , V •••)

+

• XX

3

y,x

' y 2 X

V '• J

vrm,;j^s-'vrm,;,,.(P,,., p,,„ p,,,, •••;

-xy,

-I-> ' y2y/

V • y

vrm,;;;.s-vrm,A^(p,, p,,, p. , •••;

+

^p ^ xy/

Py,y,-

P y2y/

V • y

vrm,;;;.s-'vrm,;,,.(p,,., p,,,, p ,,, •••; 5.62

Appendix C Development of a Feature Tracking Strategy 179

Commencing with the Estimated Sensor State.

The Prediction is:

and Update

Xvrnewj =Xv + P„Vrmj ;^S- 'v - i -P ,y^Vrmj ; :S- 'v 5.64

The Observed Feature State Prediction is:

y i (new) ~ y i 5. 65

Update

y^new) = y ,• + Py,x V( m, ^ 8"^ + P^^ V^m, j ^ S"'V 5. 66

For the Unobserved Feature State, the Prediction is:

yj(new) ~ Vj 5. 67

While the Update:

yj(ne.) =yj+Vyj.V(m,)lS-'v + F^^^y(mj;S-'v 5.68

Next the covariance matrix P will be considered. This matrix is partitioned with covariance components comprising: (a) the covariance between the sensor state and itself, (b) the sensor state and the observed feature state, (c) the observed feature state and itself, (d) the sensor state and the unobserved feature state, (e) the observed feature state and an unobserved feature state, and (f) two unobserved feature states.

Because of the considerable size of matrix P it is simpler to define component sub matrices as follows:

A=Vrm,;[^S-'Vrm,;,^ 5.69

B = W(m,)iyW(m,)^^ 5.70

C = Vrm,;;^S-'Vrm,A^ 5.71

D = Vrm,;;^S-'Vrm,;,,. 5.72

Appendix C Development of a Feature Tracking Strategy 180

Now for each ofthe partitoned covariance states;

C 5.5.11 Sensor State and Itself

Prediction

^.Unew) =Vrfv>>xvPxxVrfy;[^ + Q 5.73

Update

p = P - / T * A P - » - P R P - i - P r P - i - P D P ) <> Id *• x%(new) * XX \ * x x ' ' ^ * x x ' - x x * ' * y,x ' * x y, ^ * xx ^ x y , * y , x / j . / t

C 5.5.12 Sensor State and Observed Feature State

Prediction

"xy/«ew; ~ ^ U V ) X V "xy, 5. 75

Update

Pxy,„e.; = Pxy,-rPxxA P.y,, + P. .B P,,.,,. + P,,,. C P,,,. + P, ,,.D F^^J 5.76

C 5.5.13 Observed Feature State and Itself

Prediction P = P

yiyitnew) -^yjy,- 5.77

Update

P„/ .~ ; = P„„-rP,„AP„, +P,,>BP„„ +P _ ^ C P , , +P,„ D P , , J 5.78

C 5.5.14 Sensor State and an Unobserved Feature State

Prediction p _/f i p 5.79

xyj(new) I * v /x v xy y

Update

P„X..« = P « / ^ . . A P > , , +P.xBP,„^ +P . ,CP ,^ , +P . , ^DP,„^ j 5.80

Appendix C Development of a Feature Tracking Strategy 181

C 5.5.15 Observed Feature State and an Unobserved Feature State

Prediction

P = P yiy/new) * y, y^

5.81

Update

Py/y/-^ = Py/y,-^y/xAPxy, +Py,xBP,,,, +Py/y,CP,^, +Py,yDP^,,^^; 5.82

C 5.5.16 And finally Two Unobserved Feature States

Prediction

P = P yjyk(new) * yyyt 5.83

Update p = P v , V . - r P y x A P , y , + P y , B P

yjvk ^ y y * y * y /x y / y *

+ Py y C P,y^ + P y y D P y y J yj'i x y * y ^ y j y / y * ^

5.84

Equation 5.84 also applies for determination of the covariance between an unobserved feature state and itself (ie for j = k)

C5.6 Multiple Steps

The development of the above equations permits the efficient filtering for the specific case of tracking a single feature over multiple incremental steps. Assume for the moment tiiat the robot is at the start position and is able to observe feature i:

i(0) = y,(o)

V •• J

.P(o) =

Pxx(0) Pxy,(0) Pxy,(0)

Py.x(0) Py,y.{0) Py.y,(0)

Py2x(0) Py^y.Co) ^yiyM 5.85

During tiiese multiple steps while the single feature i is being tracked, equations 5.63 to 5.66 and 5.73 to 5.78 could to be used to update tiie state vector and covariance matrix. However by storing a small amoimt of uiformation at each filter step it is possible to reduce the computations to that of only modifying

Appendix C Development of a Feature Tracking Strategy 182

X . y/' "xx ' "xy,' ''"' "y,y, directly. Other parts need to be updated only at the completion of the single feature fracking in a systematic and generic manner.

To outline the process first consider:

From examination of equations 5.79 to 5.82 it is evident that they may be expressed in the form

P , = E T P . , / O ; + F ^ P , , , / O J 5.86

For a single measurement update, using Equations 5.80 and 5.82,

P.,,(.») = p.,, -[P„A+P.„,CjP.,^ -1P„B+P. , DJP,,,^ 5.88

P,.,M=I'y,y; -iP„xA + P,„CjP„, -[P,,B + P„, DJP,,, , 5.89

To obtain a form similar to equations 5.86 and 5.87 matrices E, F, G and H are defmed as:

E=I-[P,,A-f-P,,,.Cj 5.90

F=-[P„B-hP,,_.Dj 5.91

G=I-[Py,xB+Py,yDj 5.92

H=-[Py,.xA + Py,Cj

Equations 5.88 and 5.90 can now be written as:

P r ^ = EP + FP 5.94

P / ^ = GP + HP 5.95 y/yy {new) ^ * y,yy ^ '^'^^ xyy

Appendix C Development of a Feature Tracking Strategy 183

which is in the desired form. Coefficient matrices ET , F j , G r , Hj of equations 5.88 and 5.89 for a measurement update become:

ET(«ew) = E E T + F H T 5 96

^^Tinew) = E F T + F G T 5, 97

^T(new) = G G T -|- UFJ 5. 9g

^T{new) = G H j + HE J 5. 99

At a prediction (From equations 5.79 and 5.81) ET , FT become:

ET(«ew)=Wy)xyET 5.100

FT(„eu')=Wy)xyFT 5.101

GT, HT remaining unchanged.

Consider next Py y . To express the prediction and measurement updates for Py y (the

covariance between the position estimate of two ofthe unobserved features) in the same form as the equations just derived, it can be deduced from equations 5.83 and 5.84 that:

V = Py,„ ro;-tp,,/oMTP... ro;+p,,/o;B,p„. w + P„„ WC,P„/0>P.,, (0)V,F,^JO)] 5.102

For a single measurement update, re-expressing in the form of equation 5.84 gives,

P = P — [ P AP -i-P BP yjyi/new) Syyt 1 Vj* m Vj* Vtyk

+Py.y,CP,,^ +PyyyDPy,,, ]

substitiiting for P.^., Py^ ., P,y^, and P .y.,

V ( - ) = y . -Py,x(0)[E^AE, +E^BH, + H ^ C E , +H^DH, ]p ,y /0 ;

-Py^,(0)[E^AF, -i-E^BG, +H^CF^ -HH^DG,]Py,y/0;

5.103

Appendix C Development of a Feature Tracking Strategy 184

-Pyjy. (0)[FT AE, -fE^BH, + G ^ C E T +G^DH,]p,y^ (0)

-Pyjy.WK'AE, +F,^BG, -hG^CF, +G^DG,]Py,y/0;

5.104

AT , BT , CT , and DT at a measurement update become:

AxCnew) = AT + E^ A E T + E^BH, + H^CE, + H^DH, 5.105

BT(new) = B T + Ej AFT + E^BG, + H ^ C F T + H^DG, 5.106

CT(„ew) = C T + FT AET + F.;;BHT, + G^CE,. + G^DH,, 5.107

DT(n.w)=DT +F.[AFT. -j-F^BG,. + G T C F T -J-G^DG,. 5.108

hi a prediction, equation 5.83 shows Py .y^ remains unchanged and therefore it is unnecessary to

change AT , BT , CT and DT for a prediction

Consider next yy. Proceeding as above the estimated state yy of an unobserved feature is:

yy =yyro;+Py ./OjmiT +Py ,y,. rO>iT 5.109

(with miT and niT yet to be defined), and for a measurement update:

yy M = y y +Py ,V(m,.) ^ S-V-hPy .y_.V(n .) ,. S-'v 5.110

Substitutmg for Py,, and Py y from equations 5.88 and 5.89,

y,M =yy +Py.(0)(E^V(m,.)[^ -hH^V(m,.)JjS-'v yyxV/V-i v-z /x , . v--,,y,/ g j j j

Py/y/(0)(FTV(m,.),, ^+G^V(m,.)JjS-'v y;y/

Thus in a measurement step:

m»TM =m.T +(E5:vrm,;; +H^Vrm,.;y . )s-'v 5.112

n'T(„e.) = niT + (Fx'Vrm,;[^ + G^Vrm,;;^ )s->v 5.113

Appendix C Development of a Feature Tracking Strategy 185

Reference to equation 5.67 shows that in a prediction step yy remains unchanged and tiierefore there is no need to alter miT and niT.

To clarify and summarise the above procedures, consider the situation where tiie robot has mapped multiple features but is currentiy only making measurements of feature i. j and k are two unobserved features. The parts of tiie estunated state vector and covariance matrix tiiat are directly mvolved in the motion and measurement are continuously updated. The mformation needed to generically update the other sections can be convenientiy and efficientiy stored, so tiiat a single step, at the end ofthe motion, can bring them up to date.

Prior to motion the estimated state vector and covariance of the system are designated by if 0 j and V(0) and the followmg matrices and vectors are mitialised to:

AT = 0, B T = 0, C , = 0, DT = 0, E T = I, FT = 0, G T = I, H T = 0, miT = 0, UIT = 0

Each of these store the generic update information referred to. The subscript T is used to signify "Total". Note AT ...HT are square matrices equal in size to yj while miT and UIT are column vectors ofthe same dimension.

Prediction

While the cameras are in motion, the components of the estimated state vector and covariance are updated as below:

Xv(«ew)=fv(Xv'«) 5.114

v./ \ = V 5.115 •' i(new) J I

P / ^=V(f ) P V/t / + Q 5-116

P , . = V/T >) P 5.117 xy((new) ' l * v / x v xy,-

P / ^ = P 5.118

And stored matrices:

EW ^ = V/T J E 5.119

EW ^ = V/T ) F 5. 120

Appendix C Development of a Feature Tracking Strategy 186

Measurement Update

The following matrices and updates are calculated for measurement of feature i:

A = vrm,;'s-'vrm,;^ XV

B = vrm,;;s-'VK;y,.

c=vrm,;;,s-'vrm,,A^

D = vrm,;;.s-'vrm,;y^

E = I-K.A + P,y .cJ

F = -[P„B-hP,y^.Dj

G=I-[Py,xB + Py,yDj

H=- P A+P C y,.x y .y . _

.7- c -1 . Xyw =iy H-PxxV(m,)^, S-V+P,y V(m,);.S-'v

Uew) = y.- + Py/.V(m,.)l S-'v + Py,.y V(m,);,. S" v

Pxx(„ew) =Pxx - IPxx A P . + PxxBPy, + Pxy C P . + P.y, DPy , )

P x , W = P x y , - IPxxAP, . . -HPxxBPy^y.. +Pxy CPxy,. + Pxy DPy,.y,. )

Py,.y/W = Py..y/ - i P y , A P x y . + Py^BPy,,,. + Py,.y, CPxy,. + Py,.y, DPy,.y/J

Stored matrices and vectors:

-h(EiV(m,)^ ,+H;V(m,) ; j s -v

5.121

5.122

5.123

5.124

5.125

5.126

5.127

5.128

5.129

5.130

5.131

5.132

5.133

m>T(new) = " » ' T 5.134

Appendix C Development of a Feature Tracking Strategy

n>T(new) = niT+(FT''V(m,)^^-^G^V(m,);:_)s-'v

187

5.135

BT(new) = B T + E ^ A F T + E ^ B G T + H ^ C F T + H ^ D G T 5.136

CT(„ew) = C T + F T ' A E T + F T ' B H T + G ^ C E T + G ^ D H T 5.137

I>T(new) = »T + FT AF , + FT^BGT + G ^ C F T + G ^ D G , 5.138

ET(new) = E E T + F H T 5. 139

PxCnew) = E F T + F G T 5. 140

GT(new) = G G T + H F T 5. 141

BT(new) - G H T + H E T 5. 142

Note: (1) Theorder of calculating miT , niT and AT ... DT is important as the expressions make use

of old values of ET .. . .HT (2) Similarly for equations 5.139 to 5.141, matrices on the right hand side of expressions use old

versions of ET .... HT-

To accommodate these requirements carefiil storage of ET to HT is required. The benefits are that the computational expense in making updates to the stored matrices and vectors at prediction and measurement is constant, independent of the number of features in the map. To achieve this, parts of the total state vector and covariance matrix not related to the current update are not processed until later at the end of the motion. The information required for this later processing being stored in a generic way.

Final Full Update

Once the motion has been completed, the fiill estimated state vector can be readily brought up to date usmg the following equation.

For each unobserved feature j :

yy =yyro;+Py^/0;m,T +Py.y/0;mT 5.143

Appendix C Development of a Feature Tracking Strategy 188

The update of tiie covariance between a pan of unobserved features j and k is:

Pyyy. =Pyyy/0>>-lPyyxrOMTPxy/0; +P , . , ro ;BTPy, , /0 ;

+Py,y,. ro;cTP,y, ro;+Py .y/o;DTPy,,, (V

5.144 To update the parts of the covariance matrix relatmg unobserved feature j to the sensor state and the unobserved feature state (equations 5.145 and 5.146):

Pxyy=ETP,y/0;+FTPy_.y.ro; 5.145

Py/yy=GTPy,yyrO;+HTP,y/0; 5.146

Choice of Feature

Following a feature measurement m, the iimovation covariance matrix S formed provides a measure of how much the actual measurement is expected to vary from that predicted in angular measurement coordinates the (a, e, y) (see equation 5.40). This knowledge can be used as the basis for deciding which feature to frack. The volume in (a, e, y) space of the ellipsoid represented by S at the iioa level can be calculated for each visible feature: eigenvalues ^i, X2,

Xi, of S mean that the ellipsoid has axes of length n^^JA^, n^-f^i and n^^JJ^. If this volume is

labelled Vs (equation 5.147), then its calculation for each of the features currently visible, provides a criterion for selection, based on the feature with the highest value.

V.=-7r«^W,A3 5.147

This criterion takes no account of the time the head would take to saccade to an altemative feature. For instance, if two features are widely apart (one in the front of the robot and the other at 90 degrees), then the tune for the head to moves to a position where it can commence the measurement of a new feature can be quite long (perhaps 1 second). This unpinges significantly on the total navigational time. An optunal strategy would incorporate a penalty for this saccading time into the choice-of -feature criterion. The Davison criterion adopted for this work does so, and is outlined below.

The decision, as to whether to switch to a new feature or stay tracking the existing one, is made while the robot is moving. The choice is based on the predicted robot and map state that would exist m a short fiiture interval of tune if (a) the saccade and measure of tiie chosen feature was made or (b) fracking of the existing feature continued. The referred to "short fiiture interval of time", is tiie estimated time it would take to saccade to and measure any feature under consideration.

Appendix C Development of a Feature Tracking Strategy 189

Formally the decision process is implemented according to:

1. Ni, the number of measurements that would be lost while saccading to each of the visible featiires is calculated. To determme this, an estunate of head movement time, to correctiy move to and locate each featiu"e, is required.

2. Ascertain the largest Ni (Nmax); this represents the largest number of measurements lost during the longest saccade.

3. If an immediate saccade is to be made, estimate for each feature i the state that would exist after Nmax + 1 steps.

4. Evaluate Vs(max) for each of the above estimated states and saccade to the feature with the lowest Vs(max), unless the Vs(max) ofthe feature being tracked is akeady the lowest.

Bibliography 190

Bibliograpliy

[AA91] F. Arman and J. K. Aggarwal. "Automatic generation of recognition sfrategies using CAD models," Workshop on Directions in Automated CAD Based Vision, pp. 124-133, Los Alamitos, California, IEEE, Computer Society Press, June 1991.

[AA93] F. Arman and J. K. Aggarwal, "Model-based object recognition in dense-range images ~ A review," ACM Computing Surveys, 25(1), pp. 5-43, 1993.

[AL91] M. Andersson and A. Lundquist, "Tracking Imes m a stereo image sequence," Technical Report TRITANAP9116, Dept. of Numerical Analysis and Computing Science, Royal histitute of Technology, July 1991.

[AN93a] M. Andersson and P. Nordlund, "Modeling, Matching and Tracking for the Stereovision II project," Technical Report TRITANAP9322, Dept. of Numerical Analysis and Computing Science, Royal histitute of Technology, June 1993.

[AN93b] M. Andersson and P. Nordlund, "Modeling, Matching and Tracking for the Stereovision II project: Program Description", Technical Note CVAP TNI2, Dept. of Numerical Analysis and Computing Science, Royal Institute of Technology, June 1993.

[ANE93] M. Andersson, P. Nordlund, and J.O. Eklundh, "Modeling, matching and tracking for the Stereovision II project: State-of-the-Art review", Keport, ISRN KTH/NA/P-93/21-SE, June 1993.

[Ark98] R.C. Arkin, "Behavior-Based Robotics," Cambridge, MA.- MIT Press, 1998.

[AS97]C.G. Atkeson and S. Schaal, "Robot learning from demonstration", Jr. D. H. Fisher, editor, in Proc. ofthe Int. Conf. on Machine Learning, pp. 12-20, Morgan Kaufinann, 1997.

[Aya91] N. Ayache, "Artificial Vision for Mobile Robots: Stereo Vision and Multi-sensory Perception," M/T'/Ve^.s, Cambridge MA, 1991.

[AZH96] M. Armstrong, A. Zisserman and R. Hartley, "Selfcalibration from image triplets," B. Buxton and R. Cipolla, editors, in Proc. 3rd European Conf. on Computer Vision, Cambridge, Volume 1, pp. 3-16, April 1996.

[BA02] D.C. Bentivegna and C.G. Atkeson, "Leaming How to Behave from Observation," SAB'02-Workshop on Motor Control in Humans and Robots: on the interplay of real brains and artificial devices, Edinburgh, UK, August, 2002.

[Baj88] R. Bajcsy, "Active perception," in Proc. IEEE, 76(8), pp. 996-1005, 1988.

Bibliography 191

[BCFHLSST98] W. Burgard, A.B. Cremers, D. Fox, Hahnel, G. Lakemeyer, D. Schulz, W. Steiner and S. Thrun, "The interactive museum tour-guide robot," in Proc. ofthe Fifteenth National Conf. on Artificial Intelligence, 1998.

[BDFC98] W. Burgard, A. Derr, D. Fox, and A.B. Cremers, "Integrating global position estunation and position tracking for mobile robots: The dynamic Markov localization approach," in Proc. of the lEEE/RSJ Int. Conf on Intelligent Robots and Systems, 1998.

[BEF96] J. Borenstein, H. Everett and L. Feng, "Navigating Mobile Robots: Systems and Techniques," A.K. Peters Ltd., Wellesley, MA, 1996.

[Ben96] A. Bennet, "Do animals have cognitive maps ?," Journal of Experimental Biology, pp. 219-224, 1996.

[Bes88] P.J. Besl. "Geometric modeling and computer vision," in Proc. of the IEEE, 76(8), pp. 936-958, August 1988.

[Bet96] A. Bethell, "A hippocampally-inspired connectionist model for mobile robot localisation," Mflj/er'5 thesis. Department of Computer Science, University of Manchester, 1996.

[BHH84] R.C. Bolles, P. Horaud and M.J. Hannah, "3DP0: A three dimensional part orientation system," Michael Brady and Richard Paul, editors. Robotics Research, pp. 413-424, Cambridge, Massachusetts, USA, National Science Fotmdation and System Development Corporation, The MIT Press, 1984

[Bie85] I. Biederman, "Human image imderstanding: recent research and theory," Computer Graphics and Image Processing, 32, pp. 29-73, 1985.

[Bin82] T.O. Binford, "Survey of Model-Based frnage Analysis Systems," The Int. Journal of Robotics Research Vol.1 No. J, 1982.

[Bin87] T.O. Binford, "Generalized Cylinder Representation," Encyclopedia of Artificial Intelligence, pp. 321-323, John Wiley & Sons, 1987.

[Bis95] C. Bishop, "Neural Networks for Pattern Recognition," Oxford, England: Oxford University Press, 1995.

[BJ85] P.J. Besl and R.C. Jain, "Three-dimensional object recognition," ACM Computing Surveys, 17(1), pp. 75-145, 1985.

[BK96] P. Bakker and Y. Kuniyoshi, "Robot see, robot do: An overview of robot imitation," AISB96 Workshop on Learning in Robots and Animals, pp. 3-11, 1996.

[BK96b] C. Balkenius and L. Kopp, "The XT-1 vision architectiire," in Proc. of the Symposium on Image Analysis, Lund University, Sweden, pp. 39-43, 1996.

Bibliography 192

[BMRM94] K.J. Bradshaw, P.F. McLauchlan, I.D. Reid and D.W. Murray, "Saccade and pursmt on an active head/eye platform," Image and Vision Computing, 12(3), pp. 155-163, 1994.

[BNA89] J.P. Brady, N. Nandhakumar and J.K. Aggarwal, "Recent progress in object recognition from range data," Image and Vision Computing, 7(4), pp. 295-307, November 1989.

[B088] D.H. Ballard and A. Ozcandarii, "Eye fixation and kinetic depth," in Proc. 2nd Int. Conf on Computer Vision, Tanpa, pp. 524-532,1988.

[Bow91] K. Bowyer, "Why aspect graphs are not (yet) practical for computer vision," Workshop on Directions in Automated CAD Based Vision, pp. 98-104, Los Alamitos, California, IEEE, Computer Society Press, June 1991.

[BP95] J.Y. Bouget and P. Perona, "Visual navigation usmg a single camera," ICCV5, pp. 645-652, Los Alamitos, CA, IEEE Computer Society Press, 1995.

[Bro81] R.A. Brooks, "Symbolic reasoning among 3D models and 2D images," /. of Artificial Intelligence, 17(1-3), pp. 285-348, 1981.

[Bro86] R.A. Brooks, "A robust layered confrol system for a mobile robot," Technical Report 864, Massachusetts Institute of Technology, 1986.

[Bro87] R.A. Brooks. "Intelligence without representation," Workshop on the Foundations of Artificial Intelligence, Dedham, MA, 1987.

[Bro90] R.A. Brooks and L.A. Stein. "Bulling brains for bodies," Technical Report A.L Memo 1439, Massachusetts Institute of Technology, August 1993.

[Bro90b] CM. Brown, "Gaze confrol with mteractions and delays," IEEE Trans. Sys. Man andCyb.. 63, pp. 61-70,1990.

[BRZW95] P.A. Beardsley, I. D. Reid, A. Zisserman and D.W. Murray, "Active visual navigation using nonmetric structure," in Proc. 5th Int. Conf. on Computer Vision, Boston, pp. 58-65. IEEE Computer Society Press, 1995.

[Bur94] T.P.H. Burke, "Design of a Modular Mobile Robot," PhD thesis. University of Oxford, 1994.

[Bur96] W. Burgard, D. Fox, D. Henning and T. Schmidt, "Estimating the absolute position of a mobile robot using position probability grids," in Proc. ofthe Thirteenth National Conf of Artificial Intelligence (AAAI'96), pp. 896-901, 1996.

[BW92] D. Ballard and S. Whitehead, "Leaming visual behaviours," Wechsler, K, ed. Neural Networks fior Perception, New York: Academic Press, 1992

Bibliography 193

[BW02] N.Burley and M.Wingate, "An Active Camera Target Surveillance Tracker," 4th Int. on Conf Modelling and Simulation, pp. 440-445, Austt-alia, Nov 2002.

[BWB94] M. Brady H. Wang and C. Bowman, "Architecttires and algoritiuns for 3D vision: the parallel Droid system," S. Cameron and P. Probert, editors. Advanced Guided Vehicles, pp. 125-139. World Scientific, Singapore, 1994.

[BZC92] A. Blake, A. Zisserman and R. Cipolla, "Visual exploration of free space," A. Blake and A. Yuille, editors. Active Vision," MIT Press, Cambridge, MA, 1992.

[BZM94a] P.A. Beardsley, A. Zisserman and D. W. Murray, "Navigation using afifine stmcture from motion," in Proc. 3rd European Conf. on Computer Vision, Stockholm, vol. 2, pp. 85-96,1994.

[BZM94b] P.A. Beardsley, A. Zisserman and D. W. Murray, "Sequential update of projective and affine structure and motion," Technical Report QUEL report 2012/94, Dept. of Engineering Science, University of Oxford, 1994.

[Can83] J.F. Canny, "Finding edges and lines in images," Master's thesis, MIT, 1983.

[Cap99] F. Caparrelli, "View-Based Three-Dimensional Object Recognition Using Pauwise Geometric Histograms," Ph.D Thesis, University Sheffield, Sept 1999.

[Car02] O.Carmichael, "Discrimmant Filters for Object Recognition," Tech. Report CMU-RI-TR-02-09, Robotics Institiite, Carnegie Mellon University, March, 2002.

[CB90] R. Cipolla and A. Blake, "The dynamic analysis of apparent contours," in Proc. 3rd Int. Conf. on Computer Vision, Osaka, 1990.

[CCV85] I.Carlbom, I. Chakravarty and D. Vanderschel, "A Hierarchical Data Sti^icture for Representing the Spatial Decomposition of 3D Objects," in Frontiers in Computer Graphics (T.L. Kuniied.), pp. 2-12. Springer-Veriag: New York, 1985

[CD86] R.T. Chm and CR. Dyer, "Model-based recognition m robot vision," ACM Computing Surveys, 18(1), pp. 67-108, 1986.

[CDTOO] J.A. Castellanos, M. Devy, J.D Tardos "Simultaneous Localization and Map Buildmg for Mobile Robots: A Landmark-based Approach," IEEE Int. Conferenceon Robotics and Automation: Workshop on Mobile Robot Navigation and Mapping, San Francisco, 2000.

[CG87] G. Carpenter and S. Grossberg, "ART2 : Self organization of stable category recognition codes for analog mput ^dXiems," Applied Optics 26(23), pp. 4919-4930, 1987.

Bibliography 194

[CH02] O.Carmichael and M.Herbert, "Object Recognition by a Cascade of Edge Probes," British Machine Vision Conf 2002, British Machine Vision Association, Vol. 1, September, pp. 103-110, 2002.

[CJ94] J.D. Courtney and A.K. Jain, "Mobile robot localization via classification of multi-sensor maps," in Proc. of the IEEE Int. Conf on Robotics and Automation, pp. 1672-1678, 1994.

[CKK96] A. Cassandra, L.P. Kaelbling and J. Kurien, "Acting under uncertamty: Discrete Bayesian models for mobile-robot navigation," in Proc. ofthe lEEE/RSJInt. Confi on Intelligent Robots and Systems, 1996.

[CL94] I.J. Cox and J.J. Leonard, "Modeling a dynamic environment using a Bayesian multiple hypothesis stpTproach," Artificial Intelligence 66, pp. 311-344, 1994.

[C198] J.C Clarke, "Applications of Sequence Geometry to Visual Motion," PhD thesis. University of Oxford, 1998.

[CM93] A.Y.S Chong and M.Wingate, "A Colour Feattire Based Stereo Vision System for Robot Assembly," ICA/ACME Int. Conf. on Assembly combined with the Australian Conf. on Manufacturing Engineering, Nov. 1993.

[Cox 91] I.J. Cox, " 'BLANCHE', an experiment in guidance and navigation of an autonomous robot vehicle," IEEE Transactions on Robotics and Automation 7, pp. 193-204, 1991.

[CP94] S.A. Cameron and P.J. Probert, editors, "Advanced Guided Vehicles — Aspects ofthe Oxford A G V Project," World Scientific, Singapore, 1994.

[Cro95] J.L. Crowley, "Mathematical foundations of navigation and perception for an autonomous mobile robot," Workshop on Reasoning With Uncertainty in Robotics. Tutorial paper, 1995.

[Cso97] M. Csorba, ''Simultaneous Localisation and Map Building," PhD thesis, Oxford University, 1997.

[CUD96] M. Csorba, J. K. Uhlmann and H. F. DurrantWhyte, "A new approach to sunultaneous localization and dynamic map building," SPIE in Proc, pp. 26-36, 1996.

[Dav98] A.J.Davison, "Mobile Robot Navigation Usmg Active Vision," Ph.D. thesis. University of Oxford, 1998.

[DavOl] A.J.Davison, "Models and State Representation in Scene: Generalised Software for Sequential Map-Building and Localisation," Feb. 6, 2001: http://www.robots.ox.uk/~aid/.

Bibliography 195

[DC98] M. Dorigo and M. Columbetti, "Robot Shaping: an experiment in behavior engineering," MIT Bradford Press, 1998.

[DFBT99] F. Dellaert, D. Fox, W. Burgard and S. Thrun, "Monte Cario localization for mobile robots," in Proc. of the IEEE Int. Conf on Robotics and Automation, pp. 1322-1328,1999.

[DHart73] R. Duda and P. Hart, "Pattern Classification and Scene Analysis," J. Wiley and Sons, 1973.

[dHB98] L. de Agapito, D.Q. Huynh and M.J. Brooks, "Self-calibrating a stereo head: an error analysis in the neighbourhood of degenerate configurations," in Proc. 6th Int. Conf. on Computer Vision, Bombay, pp. 747-753, 1998.

[Dic91] S.J. Dickinson, "The recovery and recognition of three dimensional objects using part-based aspect matching," Technical Report CARTR572, CS

[Dic99] E.D. Dickmanns, "Computer Vision and Highway Automation," Vehicle System Dynamics, vol. 31, no 5, pp. 325-343, 1999.

[DKOO] A. J. Davison and N. Kita,. "Sequential localisation and map-building," Computer vision and robotics. Int. in Proc. of the 2nd Workshop on Structure from Multiple Images of Large Scale Environments (SMILE)," in conjunction with ECCV 2000, Dublin, freland. Springer-Veriag LNCS, 2000.

[DK02] G.N.DeSouza, A .C Kak, "Vision for Mobile Robot Navigation:A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 2, pp. 237-264, 2002.

[DKH98] H.A.L. van Dijck, M.J. Korsten F. van der Heijden, " 2 and 3Dimensional Geometric Hashmg with Pomts and Lmes," in Proc. of the Fourth Annual Confi of the Advanced School for Computing and Imaging, ASCIVS, pp. 51-57. Lommel, Belgium, ISBN 09-803086-3-3,1998

[DM92] E.D. Dickmanns and B. Mysliwetz, "Recursive 3D Road and Relative Egostate Recognition," IEEE trans. Pattern Analysis and Machine Intelligence, vol 14, no. 2, pp. 499-507, Feb, 1992

[DN96] T. Duckett and U. Nehmzow, "A robust perception-based localisation metiiod for a mobile robot," Technical Report UMCS-96-11-1, Department of Computer Science, University of Manchester, 1996.

[DN97a] T. Duckett and U. Nehmzow, "Experiments in evidence based localisation for a mobile robot," in Proc. ofthe AISB'97 Workshop on Spatial Reasoning in Animals and Robots, pp. 25-34, Department of Computer Science, Manchester University, Technical Report Series, ISSN 1361-6161, Report UMCS-97-4-1, 1997.

Bibliography 196

[DN97b] T. Duckett and U. Nehmzow, "Knowing your place in the real worid," ECAL-97 Fourth European Conf on Artificial Life, 1997.

[DN98] T. Duckett and U. Nehmzow, "Mobile robot self-localisation and measurement of performance in middle scale envfronments," Robotics and Autonomous Systems 24(1 {2), pp. 57-69,1998.

[DN99] T. Duckett and U. Nehmzow, "Leaming to predict sonar readings for mobile robot landmark selection," Report of the 4th visit in the Academic Research Collaboration (ARC) funded by the British Council and Deutscher Akademischer Austauschdienst, June 24 to July 13, Bremen University, Germany, 1999.

[DNDCC99] M.W. Dissanayake, P. Newman, H.F. Durrant-Whyte, S. Clark, and M. Csorba. "A solution to the sunultaneous localisation and map buildmg (slam) problem," Intemal Tech Report ACFR-TR-01-99, Australian Center for Field Robotics, University of Sydney, Ausfralia, January 1999.

[Don93] J. Dormett "Analysis and Synthesis in the Design of Locomotor and Spatial Competences for a Multisensory Mobile Robot," Ph.D. Dissertation, Edinburgh University, 1993.

[Dra93] B. Draper "Leaming from the Schema Leaming System," In AAAI Symposium on Machine Learning and Computer Vision, pp. 75-79, 1993.

[DRLR88] M. Dhome, M. Richetin, J.T. Lapreste and G. Rives, "The mverse of perspective problem from a single view for polyhedra location," in Proc. of IEEE Computer Vision and Pattern Recognition, pp. 61-66, Jime 1988.

[DRLR89] M. Dhome, M. Richetin, J.T. Lapreste and G. Rives, "Determination of the attitude of 3d objects from a single perspective view," IEEE Trans. Pattern Anal, and Machine Intell, 11(12), pp. 1265-1278, 1989.

[DRM95] A.J. Davison, I.D. Reid and D.W. Murray, "The active camera as a projective pointing device," in Proc. 6th British Machine Vision Conf, Birimingham, pp. 453-462, 1995.

[DucOO] T.D. Duckett, "Concurrent Map Building and Self Localisation for Mobile Robot Navigation," Ph.D Thesis, Univ. of Manchester, U.K., 2000.

[Dur94] H.W. Durrant-Whyte, "Where am I? A tutorial on mobile vehicle localization," Industrial Robot, 21(2), pp. 11-16, 1994.

[DZ86] E.D. Dickmanns and A. Zapp, "A Curvatiire-Based Scheme for frnproving. Road Vehicle Guidance by Computer Vision," in Proc. SPIE-The Int. Soc. Optical Eng-Mobile Robots, vol. 727, pp. 161-168, Oct. 1986.

Bibhography ] 97

[DZ87] E.D. Dickmanns and A. Zapp, "Autonomous High Speed Road Vehicle Guidence by Computer Vision," in Proc. 10th World Congress on Automatic Control, vol 4, pp. 232-237, 1987.

[EBDCG92] D.W. Eggert, K.W. Bowyer, C R. Dyer, H. I. Christensen and D. B. Goldgof, "Applying the scale space concept to perspective projection aspect graphs," Technical Report LIA 92-02, Laboratory of Image Analysis, Instittite of Electronic Systems, Fredrik Bajers Vej 7D, DK 9220 Aalborg O, Denmark, January 1992. Extract from the book from 7tii Scandinavian Conference on Image Analysis.

[Elfes 1987] A. Elfes, "Sonar-based real-worid mapping and navigation," IEEE Journal of Robotics and Automation, RA-3(3), pp. 249-265,1987.

[EM92] S.P. Engelson and D.V. McDermott, "Error correction in mobile robot map leaming," in Proc. ofthe IEEE Int. Conf on Robotics and Automation, pp. 2555-2560, 1992.

[Eng94] S. Engelson, "Passive Map Leaming and Visual Place Recognition," Ph.D. Dissertation, Department of Computer Science, Yale University, 1994.

[ETM93a] A.C.Evans, N.A.Thacker and J.E.W.Mayhew, 'The Use of Geomettic Histograms for Model-Based Object Recognition," Proc. 4th BMVC93, Guildford, 21-23 pp. 429 - 438, Sept. 1993.

[ETM93b] A.C.Evans, N.A.Thacker and J.E.W.Mayhew, "A Practical View Based 3D Object Recognition System," lEE conference on neural networks, Brighton, pp. 6-10, 1993.

[Eva90] R. Evans, "Kalman filtering of pose estimates in applications ofthe RAPID video rate fracker," in Proc. ofthe BMVC, pp. 79-84, 1990.

[EW95] T. Edlinger and G. "Weiss, Exploration, navigation and self localisation in an autonomous mobile robot, " Autonomous Mobile Systems, pp. 142-151, 1995.

[Fau86] O.D. Faugeras, "The representation, recognition, and locating of 3D objects," Int. Journal of Robotic Research, 5(3): pp. 27-52, Fall 1986.

[Fau92] O.D. Faugeras, "What can be seen in three dimensions with an uncalibrated stereo rig?," G. Sandini, editor, in Proc. 2nd European Conf. on Computer Vision, Santa Margharita Ligure, Italy, pp. 563-578, Springer-Veriag, 1992.

[Fau93] O.D. Faugeras, "Three Dimensional Computer Vision," MIT Press, 1993.

[Fau95] O.D. Faugeras, "Stratification of three-dimensional vision: projective, affine, and metiic representations," J. Opt. Soc. Am. A, 12(3) pp. 465-484 (1995).

Bibhography 198

[FBT98] D. Fox, W. Burgard and S. Thrun, "Active Markov localisation for mobile robots," Robotics and Autonomous Systems 25: pp. 195-207, 1998.

[Fer91] N.J. Ferrier, "The harvard bmocular head," Technical Report 919, Harvard Robotics Laboratory, 1991.

[FHPP84] Faugera, O D,Hebert, M, Ponce J, and Pauchon, E., "Object representation, identification, and positioning from range data," Proc 1" Int. Symp. On Robotics Res. MIT Press, Cambridge, MA, USA, 1984, pp. 425 - 446.

[Fis89] Robert B. Fisher, "From Surfaces to Objects," John Wiley & Sons, Baffms Lane, Chichester, West Sussex, P019 lUD England, 1990.

[FJ91a] P.J. Flynn and A.K. Jam, "Bonsai: 3d object recognition using constramed search," IEEE Trans. Pattern Anal, and Machine Intell, 13(10), pp. 1066-1075, October 1991.

[FJ91b] P.J. Flynn and A.K. Jam, "CAD based computer vision: From CAD models to relational graphs,"/^£iE Trans. Pattern Anal, and Machine Intell, 13(2), pp. 114-132, 1991.

[FMN88] T.J. Fan, G. Medioni and R. Nevatia, "3D object recognition using surface descriptions," in Proc. of the Image Understanding Workshop, pp. 383-397. DARPA, April 1988.

[FP02] D. Forsyth and J. Ponce, "Computer Vision - A modem approach," To be Published Prentice Hall 2002 ?) Currentiy On Line.

[FRM95] S.M. Pauley, I.D. Reid and D.W. Murray, "Transfer of fixation for an active stereo platform via affine stmcture recovery," in Proc. 5th Int. Conf. on Computer Vision, Boston, pp. 1100-1105,1995.

[Fri95] B.Fritze, "A Growing Neural Gas Network Leams Topologies," Advances in Neural Information Processing Systems, 7, 1995.

[FSMB98] M. Franz, B. Schoelkopf, H. Mallot and H. Bueltiio, "Leammg view graphs for robot navigation," Autonomous Robots 5, pp. 111-125,1998.

[Gall90] CR. Gallistel, "The Organisation of Leammg," MIT Press, Cambridge MA, 1990.

[Gat98] E. Gat, "Three layer architectures," In D. Kortenkamp, R. Bonasso, and R. Murphy, eds., Al-based Mobile Robots: Case Studies of Successful Robot Systems. Cambridge, MA: MIT Press, 1998.

[GBFK98] J.S. Gutinann, W. Burgard, D. Fox and K. Konolige, "An experimental comparison of localization methods," in Proc. ofthe lEEE/RSJ Int. Conf on Intelligent Robots and Systems, 1998.

Bibhography 199

[Gel96] A. Gelb, "Applied Optimal Estunation," MIT Press, 14th Edition, 1996.

[GH91] W.E.L. Grimson and Daniel P. Huttenlocher, "On tiie verification of hypotiiesized matches in model based recognition," IEEE Trans. Pattern Anal and Machine Intell, 13(12), pp. 1201-1213, December 1991.

[Gib85] A. Gibbons, "Algorithmic Graph Theory," Cambridge, England: Cambridge University Press, 1985.

[GLP84] W.E.L. Grimson and T. LozanoPerez, "Mode-based recognition and localization from sparse range or tactile data," Int. Journal of Robotics Research, 3(3), pp. 3-35, 1984.

[GLP87] W.E.L. Grimson and T. LozanoPerez, "Localizing overlapping parts by searching tiie interpretation tree," IEEE Trans. Pattern Anal, and Machine Intell, 9(4), pp. 469-482, July 1987.

[GMR98] M. Golfarelli, D. Maio and S. Rizzi, "Elastic correction of dead-reckoning errors in map building," in Proc. ofthe lEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 905-911,1998.

[GN98] J. Gluckman and S.K Nayar, "Egomotion and omnidfrectional cameras," in Proc. 6th Int. Confi on Computer Vision, Bombay, pp. 999-1005, 1998.

[Goa83] C Goad, "Special purpose automatic programming for 3d model-based vision," in Proc of DARPA Image Understanding Workshop, pp. 94-104,DARPA, 1983.

[Gra89] V. Graefe, "Dynamic Vision System for Autonomous Mobile Robots," Proc IEEE Int. Conf Intelligent Robots and Systems, pp. 12-23 1989.

[Gra92a] V. Graefe, "Vision for Autonomous Mobile Robots," Proc IEEE Workshop Advanced Motion Control, pp. 57-64, 1992.

[Gra92b] V. Graefe, "Driveriess Highway Vehicles," Proc. frit. Hi-Tech Forum, pp. 86-95, 1992.

[Gra93] V. Graefe, "Vision for Intelligent Road Vehicles," Proc IEEE Symp. Intelligent Vehicles, pp. 135-140, 1993.

[Gri90a] W.E.L. Grimson, "Object Recognition by Computer: tiie role of geometiic consfraints," The MIT Press, Cambridge, Massachusetts, USA, 1990.

[Gri90b] W.E.L. Grimson, "The combmatorics of object recognition in cluttered envfronments using constt^ined search," y. ofiArtificial Intelligence, 44, pp. 121-165,1990.

Bibhography 200

[GS96] J.S. Gutinann and C Schlegel, "AMOS: Comparison of scan matching approaches for self-localization in indoor envfronments," in Proc EUROBOT '96. 1st European Workshop on Advanced Mobile Robots. IEEE Computer Society, 1996.

[GS99] J. Gasos, and A. Saffotti, "fritegratmg fiizzy geometiic maps and topological maps for robot navigation," in Proc ofthe 3rd Int. ICSC Symposium on Soft Computing (SOCO'99), pp. 754-760, 1999.

[Guz69] A. Guzman, "Decomposition of a visual scene mto three dimensional bodies. In A. Grasseli, editor. Automatic Interpretation and Classification of frnages," Academic Press, New York, 1969.

[HGC92] R.I. Hartley, R. Gupta and T. Chang, "Stereo from uncalibrated cameras," in Proc ofthe IEEE Conf on Computer Vision and Pattern Recognition, pp. 761-764, 1992.

[HH89] C Hansen and T.C Henderson, "CAD based computer vision," IEEE Trans. Pattern Anal and Machine Intell, 11(11), pp. 1181-1193, November 1989.

[HHC96] S. Hutchinson, G.D. Hager and P. I. Corke, "A tutorial on visual servo control," IEEE Trans. Robotics and Automation, 12(5), pp. 651-669, 1996.

[HK88] R. Hinkel and T. Knieriemen, "Envfronment perception with a laser radar ui a fast moving robot," Symposium on Robot Control pp. 68.1-68.7, 1988.

[HK95] E. Huber and D. Kortenkamp, "Usmg stereo vision to pursue moving agents with a mobile robot, "IEEE Int Conf on Robotics and Automation, May 1995.

[HK97] J. Hertzberg and F. Kfrchner, "Landmark based autonomous navigation in sewerage pipes," in Proc. EUROBOT '97, 2nd European Workshop on Advanced Mobile Robots, 1997.

[HK98] E. Huber and D. Kortenkamp, "A behavior based approach to active stereo vision for mobile robots," Engineering Applications of Artificial Intelligence, vol.11, pp. 229-243,1998.

[HLM02] J.B. Hayet, F.Lerasle and M.Davy, "A Visual Landmark Framework for Indoor Mobile Robot Navigation," in Proc. of Int. Conf. on Robotics and Automation, Washington, 2002.

[HM92] K. Hughes and R. Murphy, "Ulfrasonic robot localization using dempster-shafer tiieory," Neural and Stochastic Methods in Image and Signal Processing. SPIE vol. 1766, pp. 2-11, 1992.

[HORATIO] Libraries for Vision Applications, P.F. McLauchlan, University of Surrey, 1995

Bibliography 201

[HP87] C.G. Harris and J. M. Pike, "3D positional integration from image sequences," in Proc 3rdAlvey Vision Confi, Cambridge, pp. 233-236, 1987.

[HPRSF99] A. Hauck, Georg Passig, J. Riittinger, M.Sorg, and G. Farber, "Biologically Motivated Hand-Eye Coordination for the Autonomous Grasping of Unknown Objects," Autonome Mobile Systeme, pp. 295-300, November 1999.

[HRSF99] A. Hauck, J. Ruttinger, M. Sorg, and G. Farber, "Visual Determination of 3D (jrasping Points on Unknown Objects with a Binocular Camera System," in Proc lEEE/RSJInt. Conf on Intelligent Robots and Systems (IROS'99), October 1999

[HS88] C.G. Harris and M. Stephens, "A combined comer and edge detector," in Proc. 4thAlvey Vision Conf, Manchester, pp. 147-151, 1988.

[HS90] C Harris and C Stennett. " "RAPID" a video rate fracker," Proceedings ofthe BMVC, pp. 73-77, 1990.

[HU88] D.P. Huttenlocher and S. Ullman. "Recognizing solid objects by alignment," in Proc of the Image Understanding Workshop, pp. 1114-1124. DARPA, April 1988.

[HW96] J.J. Heuring and D.W. Murray, "Visual head trackmg and slaving for visual telepresence," in Proc IEEE Int. Conf. on Robotics and Automation, pp. 515-524,1996.

[IB96] M. Isard and A. Blake, "Contour tracking by stochastic propagation of conditional density," in Proc. European Conf. on Computer Vision, pp. 343-356, 1996.

[IK88] K. Ikeuchi and T. Kanade, "Automatic generation of object recognition programs," in Proc of the IEEE, 76(8), pp. 1016-1035, August 1988.

[ILW96] P.T. Im, W.S. Lee, and M. Wmgate, "A progressive fiizzy clustering approach for the detection of linear boimdary and comers", Australian Journal of Intelligent Information Processing Systems, Vol.3 No3 pp. 42-58 Ausfralia, Spring 1996.

[Im97] P.T. Im, "An Enhanced Progressive Fuzzy Clustering Approach To Pattern Recognition," Ph.D Thesis, Victoria University, Ausfralia, 1997.

[IQWH95] P.T. Im, B. Qiu, M. Wingate and L. Herron, "Lmear boundary detection by cluster prototype centring based on fiizzy memberships", Australia New Zealand Intelligent Information Processing Confi, Perth, Ausfralia, pp. 210-212, 1995.

[IQWH95b] P.T. frn, B. Qiu, M. Wingate and L. Herron, "Implementing a neural network for a progressive fuzzy clustering algoritiim", IEEE Int. Conf on Neural Networks, Perth, Ausfralia, vol.ni,pp. 1369-1372, 1995.

Bibliography 202

[IWQ95] P.T. frn, M. Wingate, and B. Qiu, "A cluster prototype centring by membership algorithm for fiizzy clustering", 2nd Asian Conf on Computer Vision, Singapore, vol. 2, pp. 67-70,1995. 6 t- . , i'F

[Jar94] R A Jarvis, "Colour Edge Based Quad Vision Ranging," Australian Pattern Recognition Society's Workshop on Colour Imaging and Applications Canberra, pp. 83-87, 1994

[JensOO] P. Jensfelt, D. Austm, O. Wijk and M. Anderson, "Feattire based condensation for mobile robot localization," in Proc ofthe IEEE Int. Conf on Robotics and Automation, (ICRA), pp. 2531-2537, 2000.

[JF93] "Three-Dimensional Object Recognition Systems," Jain and Flynn (Editors), Elsevier, 1993.

[JFN97] M. Jagersand, O Fuentes, R. Nelson, "Experimental Evaluation of Uncalibrated Visual Servofrig for Precision Manipulation," in Proc of Int. Conf on Robotics and Automation, pp. 2874-2880, 1997.

[JGCWS97] J. Janet, R. Gutierrez-Osuna, T. Chase, M. White and J. Sutton, "Autonomous mobile robot self-localization using Kohonen and region-feature neural networks," Journal of Robotic Systems, 14(4), pp. 263-282,1997.

[JPT95] T.M. Jochem, D.A. Pomerieau and CE. Thorpe, "Vision guided lane fransition," IEEE Symposium on Intelligent Road Vehicles, Defroit, MI, pp. 30-35,1995.

[JW97] C Jones, M. and Wingate, "Robotic Harvesting in a Simulated Environment," 3rd Int. Conf. on Modelling and Simulation, Melboume, pp. 363 - 367, Oct. 1997.

[Kal60] R.E. Kalman, "A new approach to linear filtering and prediction problems," Transactions of the ASME Journal of Basic Engineering, pp. 35-45, March 1960.

[KB61] R.E. Kalman and R.S. Busy, "New results in linear filtering and prediction theory," Transactions of the ASME Journal of Basic Engineering, pp. 95-108, March 1961.

[KB91] B. Kuipers and Y. Byun, "A robot exploration and mapping strategy based on a semantic hierarchy of spatial representation," Robotics and Autonomous Systems 8, pp. 47-63, 1991

[KBM98] D. Kortenkamp, R.P. Bonasso and R. Murphy, "Artificial fritelligence and Mobile Robots: Case Stiidies of Successful Robot Systems," Cambridge, MA: MIT Press, 1998.

[KG96] S. Koenig, R. Goodwin and R. Simmons, "Robot navigation witii Markov models: A framework for path planning and leaming with limited computational resources," Dorst, L., van Lambalgen. M.. and Voorbraak, L., eds., Lecture Notes in Artificial Intelligence, Volume 1093. Springer, Berlin, pp. 322-337, 1996.

Bibhography 203

[KHOROS] Khoros: Commercial image processing software package.

[KHS95] E. Kroticov, M. Hebert and R. Simmons, "Stereo perception and dead reckonfrig for a prototype lunar rover," Autonomous Robots, 2(4), pp. 313-331, 1995.

[KK92] A. Kosaka and A.C. Kak, "Fast Vision-Guided Mobile Robot Navigation Using Model-Based Reasoning and Prediction of Uncertainties," Computer Vision, Graphics, and Image Processing - Image Processing, vol 56, no. 3, pp. 271-329, 1992.

[Koh93] T. Kohonen, "Self-Organization and Associative Memory," 3rd ed. Springer, Berlin, 1993. ^ & ^

[Kor93] D. Kortenkamp, "Cognitive Maps for Mobile Robots: A Representation for Mappmg and Navigation," Ph.D. Dissertation, Department of Electiical Engineering and Computer Science, University of Michigan, 1993.

[Kos93] A. Koschan, "What is New in Computational Stereo Since 1989: A Survey on Current Stereo Papers," Technical Report 93-22, Technical University of Berlin, Department of Computer Science, August 1993.

[Kos93b] A. Koschan, "Chromatic block matching for dense stereo correspondence," in Proc. 7th Int. Conf on Image Analysis and Processing 7ICIAP, Capitolo, Monopoly, Italy, pp. 641-648,1993.

[KS96] S. Koenig and R. Simmons, "Unsupervised leaming of probabilistic models for robot navigation," in Proc ofthe IEEE Int. Conf Robotics and Automation, pp. 2301-2308,1996.

[KS98] S. Koenig and R. Simmons, "Xavier: A robot navigation architecture based on partially observable markov decision process models," Kortenkamp, D., Bonnasso, R., and Murphy, R., eds.. Artificial Intelligence Based Mobile Robots: Case Studies of Successful Robot Systems. MIT Press, 1998.

[Kur96] A. Kurz, "Constmcting maps for mobile robot navigation based on ultrasonic range data," IEEE Transactions on Systems, Man and Cybernetics \ Part B: Cybernetics 26(2), pp. 233-242,1996.

[KW90] S. King and C Weiman, "HELPMATE autonomous mobile robot navigation system," in Proc ofthe SPIE Conf on Mobile Robots 2352, pp. 190-198 (1990).

[KW94] D. Kortenkamp and T. Weymouth, "Topological mapping for mobile robots using a combination of sonar and vision sensing," in Proc. ofthe Twelth National Conf. on Artificial Intelligence, pp. 979-984, 1994.

[KWN99] C Kunz, T. Willeke and I. Nourbakhsh, "Automatic mapping of dynamic office envfronments," Autonomous Robots, pp. 131-142,1999.

Bibhography 204

[LBLH02] F. Lerasle, M. Briot. C Lemafre, and J.B. Havet. "A Smart Sensor based Visual Landmarks Detection for Indoor Robot Navigation," in Proc ofi Int. Conf on Pattern Recognition, Quebec, 2002.

[LD92a] J.J. Leonard and H. Durrant-Whyte, "Directed Sonar Sensing for Mobile Robot Navigation," Kluwer Academic Publishers, 1992.

[LD92b] J. J. Leonard, H. Durrant-Whyte and I.J. Cox, "Dynamic map building for an autonomous mobile robot,"/nf. Journal of Robotics Research, 11(4), pp. 286-298, 1992.

[Lee95] D.C. Lee, "The Map-Building and Exploration Sfrategies of a Sunple, Sonar-Equipped, Mobile Robot, An Experimental, Quantitative Evaluation," Ph.D. Dissertation, University College London, 1995.

[Lee96] W.Y. Lee, "Spatial Semantic Hierarchy for a Physical Robot," Ph.D. Dissertation, The University of Texas at Austin, 1996.

[Leo90] J.J. Leonard, "Directed Sonar Sensing for Mobile Robot Navigation," PhD thesis. University of Oxford, 1990.

[Li96] F. Li. "Active Stereo for AGV Navigation," PhD thesis. University of Oxford, 1996.

[Lm88] R. Linsker, "Self-organization m a perceptual network," IEEE Computer 21(3), pp. 105-117, 1988.

[LL90] T. Levitt and D. Lawton, "Qualitative navigation for mobile robots," Artificial Intelligence 44, pp. 305-360, 1990.

[LL94] M.F. Land and D.N. Lee, "Where we look when we steer," Nature, 369, pp. 742-744, 1994.

[LM97] F. Lu and E. Milios, "Robot pose estunation in unknown envfronments by matching 2D range scans," Intelligent and Robotics Systems 18, pp. 249-275, 1997.

[LNH98] T.M. Lee, U. Nehmzow, and R. Hubbold, "Mobile robot simulation by means of acqufred neural network models," European Simulation Multiconference, pp. 465-469,1998.

[LOLA95] "LOLA:. Visual Object Detection and Retiieval," A project undertaken at the Centre for Robotics and Intelligent Machines, NC State University USA. Also, UCAI-95 Robot Competion and Exhibition, Montreal, Canada. 1995.

[LonSl] H.C Longuet-Higgins, "A computer algorithm for reconstiiicting a scene from two projections," iVafwre, 293, pp. 133-135, 1981.

Bibliography 205

[Low85] D.G. Lowe, "Percepttial Organization and Visual Recognition," Chapter/ Kluwer, Boston, 1985.

[Low87] D.G. Lowe, "Three dimensional object recognition from single two dunensional images," J. of Artificial Intelligence, 31, pp. 355-395, 1987.

[Low91] D.G. Lowe, "Fitting parameterized three dimensional models to images," IEEE Trans. Pattern Anal and Machine Intell, 13(5), pp. 441-450, May 1991.

[LR97] D. Lee and M. Recce, "Quantitative evaluation of the exploration strategies of a mobile robot," Int. Journal of Robotics Research 16(4), pp. 413-447, 1997.

[LRH94] D. Langer, J.K. Rosenblatt and M. Hebert, "A behaviour based system for off road navigation,"/^^fi" Trans. Robotics and Automation, 10(6), pp. 776-782, 1994.

[LSL98] G. Li, B. Svensson and A. Lansner, "Self-orienting witii on-line leaming of envfronmental featares," Adaptive Behaviour 6(3/4), pp. 535-566, 1998.

[Lun91] A. Limdquist. "Tracking lmes using a rigid motion assumption," Master's thesis. Royal histitiite of Technology, Report no. TRITANA P9104, Jan 1991.

[LW88] S. Lu and A, K. C Wong, "Analysis of 3D scene witii partially occluded objects for robot vision," in Proc. of the 9th Int. Conf. on Pattern Recognition, Int. Association for Pattern Recognition, The Computer Society Press, pp. 303-308, November 1988.

[MAFMAKH099] T. Matsui, H. Asoh, J. Fry, Y. Motomura, F. Asano, T. Kurita, I. Hara, and N. Otsu, "Integrated natural spoken dialogue system of JUO-2 mobile robot for office services," Proceedings of Sixteenth National Conference on Artificial Intelligence (AAAI-99), pp. 621-627, Horida, July, 1999.

[Man82] J. Manyika. "An Information Theoretic Approach to Data Fusion and Sensor

Management," PhD thesis. University of Oxford, 1993.

[Mar82] D. Marr, "Vision," MIT Press, Cambridge MA, 1982.

[Mat90] M. Mataric, "Navigating with a rat brain: A neuro-biologically inspired model for robot spatial representation," From Animals to Animats l,pp. 169-175,1990. [May90] P.S. Maybeck, "The Kahnan filter: An introduction to concepts," Cox, L, and Wilfong, P., eds., Al-based Mobile Robots: Case Studies of Successful Robot Systems, Springer, Berlin, pp. 194-204, 1990.

[McL92] P.F. McLauchlan, "Horatio: Libraries for vision applications," Technical Report QUEL 1967/92, Dept. Engineering Science, University of Oxford, October 1992.

Bibliography 206

[MD94] P. MacKenzie and G. Dudek, "Precise positioning using model-based maps," in Proc ofthe Int. Conf on Robotics and Automation, Los Alamitos, CA: IEEE Computer Society Press, pp. 1615-1621, 1994.

[ME85] H. Moravec and A. Elfes, "High resolution maps from wide angle sonar," in Proc ofthe IEEE InL Conf on Robotics and Automation. St. Louis, Missouri: IEEE Computer Society Press, pp. 116-121,1985.

[MF92] S.J. Maybank and O. Faugeras, "A theory of self-calibration of a moving camera," M. Journal of Computer Vision, 8(2), pp. 123-151, 1992.

[MF96] F. Mura and N. Franceschini, "Obstacle avoidance in a terrestrial mobile robot provided with a scanning retina," in Proc ofthe 1996 Intelligent Vehicles Symposium, pp. 47-52, 1996.

[MK93a] M. Meng and A.C. Kak, "NEURO-NAV: A Neural Network Based Architecttire for Vision-Guided Mobile Robot Navigation Using Non-Metrical Models of the Environment." in Proc. IEEE Int. Conf. Robotics and Automation, vol. 2, pp. 750-757, 1993.

[MK93b] M. Meng and A.C. Kak, "Mobile Robot Navigation Using Neural Networks and No metrical Environment Models," IEEE Control Systems, pp. 30-39, Oct. 1993.

[MM96] P.F. McLauchlan and D.W. Murray, "Active camera calibration for a head-eye platform using a variable state dimension filter," PAMI, pp. 15-22, 1996.

[MM95] P.F. McLauchlan and D.W. Murray, "A unifymg framework for stmcture and motion recovery from image sequences," in Proc. 5th Int. Conf. on Computer Vision, Boston. IEEE Computer Society Press, pp. 314-320, 1995.

[MMD95] V. Morellas, J. Miimers and M. Donath, "Implementation of real time spatial mappmg in robotic systems through self-organizing neural networks," in Proc. ofthe lEEE/RSJ Int. Conf on Intelligent Robots and Systems, pp. 277-284,1995.

[MMS93] D.W. Murray, P.F. McLauchlan, I.D. Reid and P.M. Sharkey, "Reactions to peripheral unage motion using a head/eye platform," in Proc. 4th Int. Conf on Computer Vision, Berlin, Los Alamitos, CA, IEEE Computer Society Press, pp. 403-411, 1993.

[Mor80] H. Moravec. "Obstacle avoidance and navigation m the real worid by a seeing robot rover," Technical Report CMURITR3, Carnegie Mellon University, Robotics fristittite, September 1980.

[MRD96] D.W. Murray, I.D. Reid and A.J. Davison, "Steering and navigation behaviours using fixation," in Proc 7th British Machine Vision Conf, Edmburgh, pp. 635-644, 1996.

Bibliography 207

[MW02] H. Moyssidis and M.Wingate, "A Speech Guided Robot Controller witii 3D Graphical Positionmg," 4th Int. on Conf Modelling and Simulation, Australia, pp. 428-433, Nov 2002.

[MZC92] J. Mayhew, Y. Zhang and S. Comell, "The adaptive confrol of a four degrees of freedom stereo camera head," Phil Trans. Royal Society London B, 337, pp. 315-326,1992.

[Neh92] U. Nehmzow, "Experiments in Competence Acquisition for Autonomous Mobile Robots," Ph.D. Dissertation, Department of Artificial Intelligence, University of Edinburgh 1992.

[Neh97] U. Nehmzow, "Scientific metiiods in mobile robotics," in Proc of TIMR'97, Towards Intelligent Mobile Robots, Manchester, Department of Computer Science, Manchester University, Technical Report Series, ISSN 1361-6161, Report UMCS-97-9-1, 1997.

[New99] P.M Newman, "On the Sti^cture and Solution of tiie Simultaneous Localisation and Map Building Problem," Ph.D Thesis Oxford University 1999.

[Nie85] L. Nielsen, "Sfrrqslifications in Visual Servomg," Ph.D thesis. Department of Automatic Confrol, Lunds Institute of Technology, September 1985.

[Nil80] N. Nilsson, "Principles of Artificial fritelligence," Palo Alto, CA: Tioga. 1980.

[Nou98] I. Nourbakhsh, "Dervish: An office-navigatmg robot," Kortenkamp, D., Bonasso, R., and Murphy, R., eds., Al-based Mobile Robots: Case Studies of Successful Robot Systems, Cambridge, MA: MIT Press, pp. 73-90, 1998.

[NS92] U. Nehmzow and T. Smithers, "Using motor actions for location recognition," Varela, F., and Bourgine, P., eds.. Toward a Practice of Autonomous Systems. MIT Press, Cambridge MA, 1992.

[NSH91] U. Nehmzow and T. Smithers, and J. Hallam, "Location recognition in a mobile robot using self-organising feature maps," In Schmidt, G., ed.. Information Processing in Autonomous Mobile Robots. Springer-Veriag, 1991.

[OHD97] S. Oore, G.E. Hmton and G. Dudek, "A mobile robot that leams its place," Neural Computation 9, pp. 683-699, 1997.

[ON97] C Owen, and U. Nehmzow, "Middle scale navigation - a case study," in Proc. AISB 97 Workshop on Spatial Reasoning in Animals and Robots. Department of Computer Science, Manchester University, Technical Report Series, ISSN 1361-6161, Report UMCS-97-4-1,1997.

[OUV98] G. Oriolo, G. Ulivi and M. Vendittelli, "Real-time map building and navigation for autonomous robots m unknovm envfronments," IEEE Transactions on Systems. Man and Cybernetics 28(3) pp. 316-333, 1998.

Bibliography 2O8

[Owe97] R. Owens, "Lecttire Series, Computer Vision Representations," CVonline/Owens/LECT13 1997

[PBF90] J.M. Pichon, C Blanes and N. Franceschini. "Visual guidance of a mobile robot equipped witii a network of self motion sensors," SPIE Conf on Mobile Robots 4 pp 44-53 1990.

[Pen87] A.P. Pentiand. "Recognition by parts," in Proc 1st Int. Conf on Computer Vision, pp. 612-620,1987.

[PJ96] D.A. Pomerieau and T. Jochem, "Rapidly Adaptmg Machine Vision for Automated Vehicle Steering," IEEE Expert, Intelligent Systems and Their Applications, vol 11, no.2, Apr., pp. 19-27, 1996.

[PM88] H.D. Park and O. R. Mitchell, "CAD based planning and execution of inspection," in Proc. of the Computer Vision and Pattern Recognition Conf, The Computer Society, The Computer Society Press, pp. 858-863, June 1988.

[PND96] D. Pagac, E.M. Nebot and H. Durrant-Whyte, "An evidential approach to probabilistic map building," Reasoning with Uncertainty in Robotics, number 1093 in Lecture Notes in AI, pp. 164-170. Springer, 1996.

[Pom89] D.A. Pomerieau, "ALVINN: An Autonomous Land Vehicle in a Neural Netivork," Technical Report CMU-CS-89-107, Carnegie Mellon Univ., 1989.

[Pom89] D.A. Pomerieau, "ALVINN: An autonomous land vehicle in a neural network", hi Touretzky, D., ed.. Advances in Neural Information Processing Systems 1. Morgan Kaufinan, 1989.

[Pom93] D.A. Pomerieau, "Neural Network Perception for Mobile Robot Guidance" Kluwer Academic Publishers, 1993.

[Pom94] D.A. Pomerieau, "Reliability Estimation for Neural Network Based Autonomous Driving," Robotics and Autonomous Systems, vol. 12, pp. 113-119, 1994.

[Pop94] A.R. Pope, Model-Based Object Recognition, "A survey of recent research," Technical Report 94-04, January 1994.

[PPKK95] J. Pan, D.J. Pack, A. Kosaka, and A.C. Kak, "FUZZY-NAV: A Vision-Based Robot Navigation Architecture Using Fuzzy friference for Uncertainty-Reasoning," Proc. IEEE World Congress Neural Networks, vol. 2, pp. 602-607, July 1995

[PPM90] Pollard, S.B., J. Porrill, and J.E.W. Mayhew, "Experiments in vehicle confrol using predictive feed-forward stereo,"/FCfS), pp. 63-70, 1990,

Bibhography 209

P>PM91] S.B. Pollard, J. Porrill and J.E. Mayhew, "Recovering partial 3D wfre frames descnptions from stereo data," Image and Vision Computing, vol 9 No 1 pp. 58-65, 1991.

[PPMF87] S.B. Pollard, J. Porrill, J.E. Mayhew and J.P Frisby, "Matching geometrical descnptions m three-space," Image and Vision Computing, vol 5 No 2, pp. 73-78, 1987.

[PPPBMF87] Porrill, J, Pollard, S B Pridmore, T P Bowen, J, Mayhew, J E W and Frisby J P "TINA: tiie Sheffield AT/RU vision System," IJCAI9, Milan, Italy 1987, pp. 1138-1144.

[PPPMF88] S.B. Pollard, J. Porrill, T.P. Pridmore, J.E.W. Mayhew and J.P. Frisby, "Components of a stereo vision system," Robert Bolles and Bernard Roth, editors. Robotics Research, pp. 229-235, Cambridge, Massachusetts, USA, August 1988. National Science Foundation and System Development Corporation, The MIT Press.

[PPT99] S.Pollard, J. Porrill, N. Thacker, 'Tina Programmer's Guide," University of Sheffield/Manchester, England, 1999.

[PZ98] P. Pritchett and A. Zisserman, "Wide baseline stereo matching," in Proc 6th Int. Conf on Computer Vision, Bombay, pp. 754-760, 1998.

[Rab89] L.R. Rabmer, "A tutorial on hidden Markov models and selected applications in speech recognition," in Proc. ofthe IEEE, 77(2), pp. 257-286,1989.

[Ram90] V.S. Ramachandran, "Visual perception in people and machines," A. Blake, editor. A.L and the Eye, Chapter 3. Wiley and Sons, 1990.

[RB95] I.D. Reid and P. A. Beardsley, "Self alignment of a binocular robot," in Proc 6th British Machine Vision Conf, Birimingham pp. 443-452, 1995.

[RB95b] S.M. Rowe and A. Blake "Statistical background modeling for tracking with a virtual camera," in Proc. 6th British Machine Vision Conf, Birimingham, 1995.

[RBFT99] N. Roy, W. Burgard, D Fox, and S. Thrun, "Coastal navigation: Robot navigation under imcertainty in dynamic envfronments," in Proc. of the IEEE Int. Conf. on Robotics and Automation, pp. 35-40, 1999.

[RCE82] D. Reily, L. Cooper and C Elbaum, "A neural model for category leaming," Biological Cybernetics 45, pp. 34-51, 1982.

[RD95] J. Racz and A. Dubrawski, "Artificial neural network for mobile robot topological hcdXisdXion," Robotics and Autonomous Systems 16, pp. 73-80, 1995.

[Req80] A.A.G. Requicha, "Representations for rigid soUds: Theory, metiiods, and systems." Computing Surveys, 12(4), pp. 437-464, December 1980.

Bibhography 210

? ° ^ t L n . r ^- ^^g^^sburger and V. Graefe, "Visual Recognition of Obstacles on Roads," Proc IEEE Int. Conf Intelligent Robots and Systems, pp. 980-987, 1994.

[RJ86] L.R. Rabiner, B.H. Juang, "An fritroduction to Hidden Markov Models," IEEE Acoustics Speech and Signal Processing ASSP Magazine, ASSP-3{1): pp. 4-16,. 1986.

[RK97] S. Rougeaux and Y. Kuniyoshi, "Robust real time ttackfrig on an active vision head," in Proc ofthe IEEE Conf on Computer Vision and Pattern Recognition, pp. 1-6, 1997.

[RM93] I.D. Reid and D.W. Murray, "Tracking foveated comer clusters using affine stmctiire," in Proc 4th Int. Conf on Computer Vision, Berlin, Los Alamitos, CA, pp. 76-83. IEEE Computer Society Press, 1993.

[RMB94] I.D. Reid, D.W. Murray and K.J. Bradshaw, "Towards active exploration of static and dynamic scene geometiy," IEEE Int. Conf on Robotics and Automation, pp. 718-723, San Diego, May 1994.

[Rob65] L.G. Roberts, "Machine perception of threedimensional solids," J.T. Tippett et al, editor. Optical and ElectroOptical Information Processing, chapter 9, pp. 159-197. MIT Press, 1965.

[Rot96] CA. Rotiiwell, "Object Recognition Through Invariant Indexing," Oxford University Press, Oxford, 1996

[RPB90] M. Rygol, S. Pollard and C Brown, "A multiprocessor 3d vision system for pick and place," in Proc of the British Machine Vision Conf, pp. 169-174, 1990.

[Ruc97] W. Rucklidge, "Efficient guaranteed search for gray level pattems," in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 717-723, 1997.

[RV81] A.A.G. Requicha and H. B. Voelcker, "An Introduction to Geometric Modeling and Its Application," Mechanical Design and Production, volume 8, chapter 5, pp. 293-328. Plenum Publishing Corporation, 1981.

[RV83] A.A.G. Requicha and H. B. Voelcker, "Solid modeling: Current status and research dfrections," Computer Graphics and Applications, pp. 25-37, October 1983.

[RZFM91] CA. Rotiiwell, A. Zisserman, J.L. Munday, and D.A. Forsyth, "Efficient Model Library Access By Projectively Invariant Indexing Functions," CVPR(92), pp. 109-114. 1992.

[SA98] A. Schultz and W. Adams, "Contmuous localization usmg evidence grids," in Proc ofthe IEEE Inl Conf on Robotics and Automation, pp. 2833-2839. 1998.

[Saf97] A. Saffiotti, "The uses of fiizzy logic in autonomous robot navigation: A catalogue raisonne," Soft Computing 1(4), pp. 180-197, 1997.

Bibliography 211

[SAY99] A. Schultz, W.Adams, and B. Yamauchi, "fritegrating Exploration, Localization, Navigation and Planning With a Common Representation," Autonomous Robots, vol 6 No3, IEEE Press, pp. 293-308, May 1999.

[SB91] F. Solina and Z. Bjelogrlic, "Metiiodolgies and techniques for interpretation of 3d range images," P. Plancke et al, editor. First ESA Workshop on Computer Vision and Image Processing for Spaceborne Applications. European Space Agency, European Space Agency, June 1991.

[SBKOO] O. Schreer, N. Brandenburg, P. Kauff "A Comparative Sttidy on Disparity Analysis Based on Convergent and Rectified Views," in Proc of BMVC 2000, 11. British Machine Vision Conf, Bristol, United Kmgdom, Sept. 2000.

[SBP97] C E. Smith, S. A. Brandt, and N. P. Papanikolopoulos, "Eye-In-Hand Robotic Tasks fri Uncalibrated Envfronments," IEEE Journal of Robotics and Automation, Vol 13, No. 6, pp. 903-914,1997

[SD92] G. Schoner and M. Dose, "A dynamical systems approach to task-level system integration used to plan and control autonomous vehicle motion," Robotics and Autonomous Systems 10, pp. 253-267, 1992.

[SD95] A. Stevens, M. Stevens and H. Durrant-Whyte, "'OxNav': Reliable autonomous navigation," in Proc. of the IEEE Int. Conf. on Robotics and Automation, pp. 2607-2612, 1995.

[SD98] S. Simhon and G. Dudek, "Selecting targets for local reference frames," in Proc. ofthe IEEE Int Conf on Robotics and Automation, pp. 2840-2845, 1998.

[Shat98] H. Shatkay, "Leaming Models for Robot Navigation," Ph.D. Dissertation, Department of Computer Science, Brovm University, 1998.

[Sin99] Vfrtual Robotics Lab (Programmer P.G Singh). http://members.tripod.com/gurvinder pec

[SK95] R. Simmons and S. Koenig, "Probabilistic robot navigation in partially observable envfronments," in Proc. of the 14th Int Joint Conf on Artificial Intelligence, pp. 1080-1087, 1995.

[SK97] H. Shatkay and L.P. Kaelbling, "Leaming topological maps with weak local odometiic information," in Proc ofthe 15th Int Joint Conf on Artificial Intelligence, pp. 920-929, 1997.

[Smi92] S. Smitii, "A new class of comer finder," in Proc 3rd British Machine Vision Conf, Leeds, pp. 139-148, 1992.

Bibhography 212

[SMVM93] P. M. Sharkey, D. W. Murray, S. Vandevelde, I. D. Reid and P. F. McLauchlan, "A modular head/eye platform for real-time reactive vision," Mechatronics, 3(4), pp. 517-535, 1993.

[SN00]A. Selinger and R. C Nelson, "Improving Appearance-Based Object Recognition in Cluttered Backgrounds," TR725, Computer Science Dept., U. Rochester, January 2000.

[SNSDRCCC97] S. Schedmg, E. M. Nebot, M. Stevens, H. F. DurrantWhyte, J. Roberts, P. Corke, J. Cunningham and B. Cook, "Experiments in autonomous underground guidance," in Proc IEEE Int Conf on Robotics and Automation, pp. 1898-1903, 1997.

[SNWGH99] V. Sequeira, K. Ng, E. Wolfart, J. G. M. Goncalves, and D. Hogg, "Automated reconstmction of 3d models from real world envfronments," ISPRS Journal of Photogrammetry and Remote Sensing, no. 54, pp. 1-22, 1999.

[Sor85] H.W. Sorenson, editor, "Kalman Filtering: Theory and Application," IEEE Press 1985.

[SSC90] R. Smith, M. Self and P. Cheeseman, "Estunating uncertain spatial relationships m robotics," Cox, L. and Wilfong, P., eds., Al-based Mobile Robots: Case Studies of Successful Robot Systems. Springer, Berlin pp. 167-193, 1990.

[SSD95] A. Stevens, M. Stevens and H. F. Durrant-Whyte, "OxNav: Reliable autonomous navigation," in Proc. IEEE Int Conf. on Robotics and Automation, pp. 2607-2612, 1995.

[ST94] J. Shi and C Tomasi. "Good features to brack," in Proc ofthe IEEE Conf on Computer Vision and Pattern Recognition, pp. 593-600, 1994.

[Sto96] H.W. Stone. "Mars pathfinder micro-rover: A low cost, low power spacecraft," in Proc ofthe 1996 AIAA Forum on Advanced Developments in Space Robotics, Madison, WI., 1996.

[SWG97] E. Shilat, M. Werman and Y. Gdalyahu, "Ridge's comer detection and correspondence," in Proc ofthe IEEE Conf on Computer Vision and Pattern Recognition, pp. 976-982,1997.

[SZ97] C Schmid and A. Zisserman, "Automatic line matching across views," in Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pp. 666-672, 1997.

[SZOO] Y. Shan and Z. Zhang, "Comer Guided Curve Matching and its Application to Scene Reconstiiiction," IEEE Conf Computer Vision and Pattern Recognition (CVPR'OO), Soutii Carolina, Vol.1, pp. 796-803, June 2000.

Bibliography 213

[TBBCDFHRRSS99] S. Thrun, M. Benewitz, W. Burgard, A.B. Cremers, F. Dellaert, D. Fox, D. Haehnel, C Rosenberg, N. Roy, J. SchuUe and D. Schulz, "MINERVA: A second generation museum tour-guide robot," in Proc of the IEEE Int Conf on Robotics and Automation, 1999.

[TBBFFHHKS98a] S. Thrun, A. Bucken, W. Burgard, D. Fox, T. Frohlinghaus, D. Henning, T. Hofinann, M. Krell and T. Schmidt, "Map leaming and high-speed navigation in RHINO", Kortenkamp, D., Bonasso, R., and Murphy, R., eds., Al-based Mobile Robots: Case Studies of Successful Robot Systems. Cambridge, MA: MIT Press, 1998.

[TBF98b] S. Thrun, W. Burgard and D. Fox, "A probabiHstic approach to concurrent mappmg and localisation for mobile robots," Machine Leaming 31(5), pp. 1-25, 1998.

[TC92]D.C. Tseng and CH Chang, "Color Segmentation Using Percepttial Attributes." Proc if Conf Pattern Recognition, vol. 3, pp. 228-231, 1992.

[TFBOO] S. Thrun, D. Fox and W. Burgard, "Monte Carlo localization with mixture proposal distribution," in Proc ofthe National Conf. on Artificial Intelligence, 2000.

[TFZ98] P. H. S. Torr, A. W. Fitzgibbon and A. Zisserman, "Maintaining multiple motion model hypotheses over many views to recover matching and stmcture," in Proc. 6th Int Conf. on Computer Vision, Bombay, pp. 485-491, 1998.

[TH84] R.Y. Tsai and T.S. Huang, "Uniqueness and estimation of tiiree dunensional motion parameters of rigid objects with curved surfaces," IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI6, pp. 13-27, 1984.

[Tha99] N.Thacker, "Tina Algoritiims's Guide University of Sheffield/Manchester, England, 1999.

[THK88] C Thorpe, M.H Herbert, T. Kanade, and S.A. Shafer, " Vision and Navigation for tiie Camegie-Mellon Navlab," IEEE Tran. Pattern Analysis and Machine Intelligence, vol. 10, no. 3, pp. 362-372, May 1988.

[TH093] J.L Thomas, A. Hanson and J. Oliensis, "Understanding noise: The critical role of motion error m scene reconstiiiction," in Proc. 4th Int. Conf on Computer Vision, Berlin, pp. 691-695,1993.

[TH094] J.L Thomas, A. Hanson and J. Oliensis, "Refmfrig 3d reconstiiictions: A tiieoretical and experimental study of the effectt of cross-correlation," Computer Vision, Graphics, and Image Processing, 60(3), pp. 359-370, 1994.

[Thr98a] S. Thrun, "Bayesian landmark leaming for mobile robot localisation," Machine Learning 33(1), pp. 41-76,1998.

Bibhography 214

[Thr98b] S. Thrun, "Leaming metiic-topological maps for indoor mobile robot navigation," Artificial Intelligence 99, pp. 21-71,1998.

[Tm99] S.Pollard, J. Porrill, N. Thacker, 'Tina Programmer's Guide," University of Sheffield/Manchester, England, 1999.

[Tin99a] N.Thacker, "Tina Algorithms's Guide," University of Sheffield/Manchester, England, 1999.

[Tm97b] N.Thacker,T. Lacey, D. Prendergast, "Tina User's Guide," University of Sheffield/Manchester, England, 1999.

[TJ] TargetJr - A C++ Computer Vision Environment - C++ programming envfronment with libraries to support: image processing; image segmentation; camera modeling; 2-d and 3D geometry; a graphical user interface based on FRESCO. TargeUr has been developed over the last 10 years, startmg at GE's Corporate R&D Center. Currently TargetJr is used by a number of vision research groups with emaphasis on geometric algorithms and object recognition. TargetJr is written in C++ and organized into a number of libraries including: numerics; spatial objects; image; image processing; segmentation; computational geometry; 3D modeling; and user interface, (by Joseph Mundy, William Hof&nan, Andrew Fitzgibbon, Peter Vanroose and Rupert Curwen / GE Corp. R&D, Oxford University, University of Leuven

[TK92] C Tomasi and T. Kanade, "Shape and motion from image sfreams under orthography: A factorization approach," Int Journal of Computer Vision, 9(2), pp. 137-154, 1992.

[TK95] C.J. Taylor and D. Kriegman. "Stmcture and motion from Ime segments m multiple images," IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 17 No. 11, pp. 1021-1033, November 1995

[TKS87] C Thorpe, T. Kanade, and S.A. Shafer, "Vision and Navigation for tiie Camegie-Mellon Navlab," Proc. Image Understand Workshop, pp. 143-152, 1987.

[TLP99] N.Thacker,T. Lacey, D. Prendergast, "Tina User's Guide," University of Sheffield/Manchester, England, 1999.

[T093] J.L Thomas and J. Oliensis, "Automatic position estimation of a mobile robot," IEEE Conf on AI Applications, pp. 438-444, 1993.

[Tsa87] Roger Y. Tsai, "A versatile camera calibration technique for high-accuracy 3D machine vision mefrology using off-the-shelf TV cameras and lenses," IEEE Journal of Robotics and Automation RA-3(4): pp. 323-344, August 1987.

Bibliography 215

[TZ98] P.H.S. Torr and A. Zisserman, "Robust computation and parameterization of multiple view relations," in Proc. 6th Int Conf on Computer Vision, Bombay, pp. 727-732, 1998.

[U1196] S. Ulhnan, "High-level vision: object recognition and visual cognition," MIT Press, 1996 TR2732, Computer Vision Laboratory, Center for Automation Research, University of Maryland, Aug 1996.

[VXLOl] VXL - C++ libraries for Computer Vision - The Vision-something-Libraries are a collection of C++ libraries designed for computer vision research. It was created from TargetJr and tiie Image Understandmg Environment (lUE) with the aim of makmg a lighter, faster and more consistent system. VXL is vmtten in ANSI/ISO C++ and is designed to be portable over many platforms. It is developed and used by a consortium including groups from the Universities of Leuven, Oxford, Manchester, and RPI, GE CRD. (YXL Consortium).

[Wal72] D.I. Waltz, "Generating semantic descriptions from drawings of scenes with shadows," PhD thesis, AI Lab. MIT, 1972.

[WD02a] M.Wingate and A.J.Davison, "3D Recognition of Remote Objects Utiizmg Stereo Vision on a Roving Platform," ACCV2002, pp. 682-688. Australia. Januarv 2002.

[WD02b] M.Wingate and A.J.Davison, "Object Retrieval usmg an Active Stereo Head and a Robot Arm," 4th Int Conf. on Modeling and Simulation, Australia, pp. 417-422, Nov 2002.

[Weh96] R. Wehner, "Middle-scale navigation: The insect case," Journal of Experimental Biology, 1996.

[Wei93] I. Weiss, "Geometiic mvariants and object recognition," Int Journal of Computer Vision, 10(3), pp. 207-231,1993.

[WF97] J. Weng, T.S. Huang, and N. Ahuja, "Motion and sttiicture from two perspective views: Algorithms, error analysis and error estimation," IEEE Transactions on Pattem Analysis and Machine Intelligence, 11(5), pp. 451-476, 1989.

[WHA89] H. Wang and J.M. Brady. "Comer detection for 3D vision using array processors," ZM Proc. BARNAIMAGE91, Barcelona, Springer-Veriag, 1991.

[Wm97a] M. Wingate, "3D Stincttire Recovery of Objects Using Line Feattires in Multiple Camera Views" 3rd/nr. Conf on Modelling and Simulation, Melboume, pp. 386-391, Oct. 1997.

[Win97b] M. Wingate, "3D Stiiictiire Recovery of Rigid Objects Using Pan-Tilt-Vergence Stereo," DICTA '97 Conf Auckland, N.Z. pp. 83-88, Dec 1997.

[Win99] M. Wingate, "Stiiictiue Recovery of Objects Using Multiple Camera Views," Post Graduate Seiminar Presentation. School of Electtical Engmeering, Victoria University, 1999.

Bibliography 2I6

[Win02] M. Wmgate, "3D Modeling for the Recognition of Remote Objects," pp. 405-410, 4th Int on Conf. Modelling and Simulation. Ausfralia, Nov 2002.

[WMMW97] P. Whaite and F.P. Ferrie, "Autonomous exploration: Driven by uncertainty," IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(3), pp. 193-205, 1997.

[WS96] M. Wmgate and A. Stoica, "A Step Towards Robot Apprectices- Visual Leaming," in Proc. of the lASTED Int Conf Robotics and Manufacturing, pp. 186-189, 1996.

[WS97] M. Wmgate and A. Stoica, "An Application of Fuzzy Neurons to Visual Leaming,"/« Proc. of the MS'97 Int Conf On modelling and Simulation, pp. 45-50, 1997.

[WTRM98] S. Waldherr S. Thrun, R. Romero and D. Margaritis, 'Template-Based Recognition of Pose and Motion Gestiires On a Mobile Robot," AAAI/IAAI, pp. 977-982,1998.

[Wv95] G. Weiss and E. von Puttkamer, "A map based on laser scans without geometric interpretation," in Proc of Intelligent Autonomous Systems 4, Karlsruhe, Germany, pp. 403-407, 1995.

[WWR93] S. W.Wijesoma, D. F. H. Wolfe, R. J. Richards "Eye-to-Hand Coordination for vision guided Robot Confrol Applications" Int J. of Robotics Research, v 12, No 1, pp. 65-78, 1993.

[Yam97] B. Yamauchi, "A frontier-based approach for autonomous exploration," in Proc. of the IEEE Int Symposium on Computational Intelligence in Robotics and Automation, pp. 146-151,1997.

[YB96] B. Yamauchi, and R, Beer, "Spatial leaming for navigation in dynamic envfronments," IEEE Transactions on Systems, Man and Cybernetics Section B 26(3), pp. 496-505,1996.

[Yea88] W. Yeap, "Towards a computational theory of cognitive maps," Artificial Intelligence 34(3), pp. 297-360, 1988.

[YL97] B. Yamauchi and P. Langley, P. "Place recognition m dynamic envfronments," Journal of Robotic Systems, Special Issue on Mobile Robots 14(2), pp. 107-120,1997.

[YSA98] B. Yamauchi, A. Schultz and W. Adams, "Mobile robot exploration and map-building with continuous localization," in Proc ofthe 1998 IEEE Int Confi on Robotics and Automation, Leuven, Belgium, pp. 3715-3720. 1998.

[YSAGGP97] B. Yamauchi, A. Schultz W., Adams, K. Graves, J. Grefenstette and D. Perzanowski, "ARIEL : Autonomous robot for mtegrated exploration and localization," in Proc ofthe Fourteenth National Conf on Artificial Intelligence, pp. 804-805, 1997.

Bibliography 217

[YZ02] R. Yang and Z. Zhang. "Model-based Head Pose Tracking Witii Stereovision," fri in Proc Fifth IEEE Int Conf on Automatic Face and Gesture Recognition (FG2002) pp 255-260, Washington, DC, May 20-21, 2002.

[Zel92] A. Zelinsky, "Mobile robot exploration algoritiun," IEEE Robotics and Automation 8(6), pp. 707-717,1992.

[ZF92] Z. Zhang and O. Faugeras, "3D Dynamic Scene Analysis," Springer-Veriag, 1992.

[ZF94] C Zeller and O. Faugeras, "Applications of non-mettic vision to some visual guided tasks," Technical Report INRIA Rapport de recherche 2308, INRIA SophiaAntipolis, 1994.

[ZF94] CS. Wiles, A. Maki, N. Matsuda and M. Watanabe, "Hyper-patches for 3D model acquisition and fracking," in Proc. of the IEEE Conf on Computer Vision and Pattern Recognition, pp. 1074-1080, 1997.

[Zha02a] Cha Zhang "A Survey on Stereo Vision for Mobile Robots," Technical Report, Dept.of Electrical and Computer Engineering, Camegie Mellon University, Pittsburgh USA, 2002

[Zha02b] Z. Zhang, "Vision-Based Modeling and Interaction," in the Fifth Asian Conf. on Computer Vision (ACCV), Invited paper, Melboume, Ausfralia, January 2002.

[Zini95a] U.R. Zimmer, "Adaptive Approaches to Basic Mobile Robot Tasks," Ph.D. Dissertation, University of Kaiserslautem, 1995.

[Zim95b] U.R. Zimmer, "Self localisation in dynamic envfronments," lEE/SOFT Int Workshop, 1995.

[Zmi96] U.R. Zimmer, "Robust world modelmg and navigation in a real worid," Neurocomputing, Special Issue 13 pp. 2-4, 1996.

[Zini97] U.R. Zimmer, "General and specific models in complex robotics systems - a critique and a proposal," Technical Report 1063, GMD, 1997.

[ZW02] M.Zulli and M.Wingate, "A Micro Mobile Surveillance System witii Wfreless Video," 4th Int Conf Modelling and Simulation, Australia, pp. 452-457, Nov 2002.