Three-dimensional object recognition system for enhancing the intelligence of a KUKA robot

ORIGINAL ARTICLE

Three-dimensional object recognition system for enhancingthe intelligence of a KUKA robot

S. Q. Xie & D. Cheng & S. Wong & E. Haemmerle

Received: 13 March 2007 /Accepted: 23 May 2007 /Published online: 7 July 2007# Springer-Verlag London Limited 2007

Abstract Machine intelligence has been a research hotspotin mechatronics in recent years. This research presents a2D/3D object recognition system for enhancing the intelli-gence of an industrial robot (KUKA robot). The imageprocessing and object recognition algorithms were devel-oped using software packages VisionPro and LabVIEW.Experiments were carried out to verify the performance ofthe system. It can be concluded that the system is able torecognise any general 2D object within a time of sixseconds. The performance of the system in 3D objectrecognition is slower compared to 2D objects, which islargely affected by the number of trained images stored inthe database, the complexity of the object, and the presenceof similar objects in the database. Despite the complexity ofthe objects being recognised, both the overall accuracy andsuccess rate of the system are close to 100%. The developedsystem proved to be robust and allows for automaticrecognition of objects in the manufacturing environmentdescribed in this paper.

Keywords Machine vision . LabVIEW. KUKA robot .

Image processing . Image recognition

1 Introduction

Two/three-dimensional (2D/3D) object recognition is oneof the major research topics in recent years in the areas of

robotics and automation. The aim is to simulate the visualsystem of humans and allow intelligent machines to bebuilt. With higher intelligence, difficult tasks that requirethe judgment of human eyes can be performed and replacedby machines. Possible applications include automaticmanufacturing [1], product inspection [2–5], counting andmeasuring [3], medical surgery [6, 7], 3D modelling ofphysical objects in reverse engineering [8–10], human face/figure recognition in surveillance systems [11–14] andindoor/outdoor environment mapping in navigation andcollision avoidance systems [15, 16].

The development of 2D/3D object recognition systemson industrial robots gained particular interests in industryrecently since the world is stepping into the century ofautomation. Tedious, repetitive and dangerous assemblytasks are not favoured by humans. Labour resources areoften expensive and scarce. High and consistent workquality is hard to achieve with manual labour. In order formanufacturers to survive in a competitive market, they needto provide large variations of product types, cut productioncosts, reduce lead time and improve product qualityfrequently. The use of flexible automation systems inassembly processes, thus, became the trend. However,without the employment of vision systems, objects’orientations, dimensions, shape, pattern and other details,which are important in assembly tasks, are difficult todetermine by machines. Common ways to obtain thisinformation include measuring the weight and overalldimensions using a variety of sensors. These methods arevery limited, inaccurate and can restrict the work type ofthose machines to repetitive jobs that require little intelli-gence or judgement. Having a vision system can enhancemachines’ workabilities. Moreover, in the real world,objects exist in both 2D and 3D, therefore it is necessaryfor a system to handle both 2D and 3D objects [17, 18].

Int J Adv Manuf Technol (2008) 38:822–839DOI 10.1007/s00170-007-1112-y

S. Q. Xie (*) :D. Cheng : S. Wong : E. HaemmerleDepartment of Mechanical Engineering,The University of Auckland,Private Bag 92019,Auckland, New Zealande-mail: [email protected]

This paper presents a computer vision system that iscapable of recognising general 2D and 3D objects using anindustrial charge-coupled device (CCD) monochrome cam-era. Object recognition algorithms are developed to recog-nise 2D and 3D objects of different geometry. The system isdesigned and developed, constructed, and implemented onan industrial robot, manufactured by Keller und KnappichAugsburg (KUKA) Robotics. This is to enhance the intelli-gence of the KUKA robot so that it could recognise differentobjects accurately and efficiently. The image processingsoftware package VisionPro developed by Cognex [19] andthe programming language Laboratory Virtual Instrumenta-tion Engineering Workbench (LabVIEW) developed byNational Instruments (NI) [20] are used. The motivationbehind this work is to enable the KUKA robot to recognisedifferent objects and respond according to the objectidentified in a flexible and automated manufacturingenvironment.

2 Related work

2D object recognition has been well researched, developedand successfully applied in many applications in industryfor a number of years. 3D object recognition on the otherhand is relatively new, and a number of issues still exist.The main issue involved in 3D recognition is the largeamount of information which needs to be dealt with. Inmany 2D cases the recognition algorithm only needs to dealwith translation and rotation of the object; however, in 3Drecognition systems there are an infinite number of possibleviewpoints, making the matching of the information of theobject obtained by the sensors to the database difficult [21].Different types of sensors can be used for obtaining 3Dinformation of objects. These can be divided into contactand non-contact methods. Contact methods use a mechan-ical probe to physically touch and measure the location ofdiscrete points on the object and interpolate between thepoints to form a surface [1]. The coordinate measurementmachine (CMM) is perhaps the most widely used methodfor measuring 3D information using contact measurementmethods [22, 23].

Non-contact methods, on the other hand, do not come incontact with the object of interest and instead use rangesensors or cameras to obtain information about the object.The main advantage of non-contact methods is the non-obstructive approach to obtaining an object’s surfaceinformation. As the sensors do not come in contact withthe object, there is no risk in damaging the object or thesensors. Also, it is possible to use range sensors to measuresurfaces where it is not possible for contact sensors toreach. From a manufacturing point of view, it is alsodesired to use non-contact methods, since they often do not

involve moving parts. This could potentially speed up themanufacturing process.

Vision system is a subset of the non-contact approach forobtaining object information. This is the choice of approachin the system described in this paper, due to its non-obstructive nature and the ability to capture information ofa large area at once. Vision approaches can be furtherdivided into two types of approaches, namely active visionand passive vision systems.

Object recognition approaches can be divided into twocategories. The first approach utilises appearance featuresof objects such as the colour and intensity. The secondapproach utilises features extracted from the object andonly matches the features of the object of interest with thefeatures in the database. An advantage of the feature-basedapproaches is their ability to recognise objects in thepresence of lighting, translation, rotation and scale changes[24, 25]. This is the type of approach used by the PatMaxalgorithm in this system.

2.1 Vision systems

2.1.1 Active vision systems

Active vision systems refer to the case when a camera andprojector are used [26]. The projector is used to project alight pattern onto the object while the camera acquires animage of the object with the projected pattern. An exampleof this is the moiré pattern [27]. This is a widely usedapproach for 3D object recognition systems, as it is possibleto obtain 3D surface information of the object by knowingwhat the light pattern projected onto the surface is andanalyse the shape of the light pattern in the acquired image.A downside of this approach is the need of extra hardware,which is not always desirable. In many real life applica-tions, it is important to reduce the production andinspection costs. Introducing additional hardware willincrease cost. In this research active vision systems arenot considered to minimise the number of hardwarerequired.

2.1.2 Passive vision systems

Passive vision systems refer to the case when one or morecameras are used to acquire images of the object. Thesetypes of systems differ from active vision systems in that itis possible to obtain all the necessary images of the objectby the use of a single camera. These types of approachesoften do not obtain the same amount of information of theobject as active vision systems do due to the difference inhardware used. Smart algorithms are often required to makeup for this difference. The proposed system belongs to thiscategory. The presented system is capable of achieving a

Int J Adv Manuf Technol (2008) 38:822–839 823

high success rate for the majority of the objects with highaccuracy by utilising sensor planning techniques togetherwith a complex object recognition algorithm.

2.2 Feature-based recognition

Feature-based recognition approach, as its name suggests,utilises features extracted from the object for recognition. Anumber of different types of features can be used forrecognition, and they can be categorised under pointfeatures, line features, and region features. Recently, anew type of features have also been widely used in the fieldof object recognition and image registration called localdescriptors. Unlike conventional features, local descriptorsare designed to be invariant to different types of trans-formations, depending on the focus of the descriptor [28,29]. It should be noted that object recognition is considereda specific case of image registration and while manyalgorithms presented by authors only deal with objectrecognition, they can be easily extended to deal with imageregistration problems. This section first discusses somerecent advances in image registration techniques, followedby a review of recent object recognition systems used inrobotic systems.

2.2.1 Image registration

Image registration is the process of identifying correspondingappearances or features from two or more images [30],usually from images taken under different conditionsincluding [25]: (1) multiview (different viewpoints); (2)multitemporal (different times); and (3) multimodal (differ-ent types of sensors). Of the different cases above, multiviewanalysis is of interest in this research. Multiview analysisdeals with images taken from different viewpoints using thesame type of sensors. In the case where a single sensor isused to acquire multiple images of the same object, thetemporal change between two successive images is oftenconsidered negligible.

Due to the vast amount of work done in image registrationand object recognition, it is not possible to give a compre-hensive review of the works done in this field. Instead, anoverview of the recent development on object recognitionsystems in robotics are carried out and interested readers inimage registration are referred to the following works for adetailed discussion on the state of the art in the differenttypes of approaches utilised in image registration problems.Brown [30] presented a detailed discussion on the differenttypes of image registration techniques, this is, however,fairly out-dated. Zitová and Flusser [25] carried on whereBrown left off and their work covered many recent worksin image registration. Goshtasby [31] also presented adetailed discussion on some 2D and 3D image registration

techniques and his work is slightly more up-to-date thanZitová and Flusser’s work. Bowyer et al. [11] provided agood survey on face recognition approaches. Lastly, a num-ber of comparison works have been done by Mikolajczykand Schmid [32, 28] which discussed and providedexperimental results comparing the various local descriptorsin recent years.

2.2.2 Robot object recognition

This section discusses some recent works of objectrecognition systems in robotics. A number of the discussedsystems in this section show that with the advancement ofrobots and vision systems, object recognition which utilisesrobots is no longer limited to manufacturing environments.Wong et al. [21] developed a system which uses spatial andtopological features to automatically recognise 3D objects.A hypothesis-based approach is used for the recognition of3D objects. Firstly a search tree is generated from the imageand model features, then an unmatched image feature iscompared with the 3D model features and a hypothesis isgenerated for the match, which is used to estimate a 2Ddescription of the object based on a perspective projection.The reliability of the match is computed based and a scoreis produced. This process is repeated for all featuresidentified in the image and the match with the best scoreis the correct match. This system does not take into accountthe possibility that an image may not have a correspondingmodel in the database since the best matching score is alwaysused to determine the correct match, and thus is prone tofalse matches if the object in an image is not present in thedatabase.

Büker et al. [33] presented a system where an industrialrobot system is used for the autonomous disassembly ofused cars, in particular, the wheels. The stereovision systemwas used as a guidance for the robot arm. The system utiliseda combination of contour, grey values and knowledge-basedrecognition techniques. Principal component analysis (PCA)was used to accurately locate the nuts of the wheels whichwere used for localisation purposes. Finally, a coarse-to-fineapproach was used to improve the performance of thesystem. The vision system was integrated with a force-torque sensor, a task planning module, and a unscrewing toolfor the nuts to form the complete disassembly system.

A number of recent works have used methods which areinvariant to scale, rotation, translation and partially invari-ant to affine transformation to simplify the recognition task.These methods allow the object to be placed in an arbitrarypose. Jeong et al. [34] proposed a method for robotlocalisation and spatial context recognition. The Harrisdetector [35] and the pyramid Lucas-Kanade optical flowmethods were used to localise the robot. To recognisespatial context, the Harris detector and scale invariant

824 Int J Adv Manuf Technol (2008) 38:822–839

feature transform (SIFT) descriptor [36] were employed.Peña-Cabrera et al. [37] presented a system to improve theperformance of industrial robots working in unstructuredenvironments. An artificial neural network (ANN) is usedto train and recognise objects in the manufacturing cell,which is invariant to the transformations listed above. Theobject recognition process utilises image histogram andimage moments which are fed into the ANN to determinewhat the object is. The system was tested in a manufactur-ing cell and achieved high identification rates within a shorttime frame.

In Abdullah et al.’s work [38], a robot vision system wassuccessfully used for sorting meat patties. A modifiedHough transform [39] was used to detect the centroid of themeat patties, which is used to guide the robot to pick upindividual meat patties for sorting. The image processingwas embedded in a field programmable gate array (FPGA)for online processing. This system overcomes a number ofissues related to existing food processing by human labour,such as hygiene standards, accuracy and speed.

3 Overall system structure

The main objectives of this research are to investigate therecent approaches, methods and tools used in objectrecognition; design and develop a computer vision systemthat is able to recognise general 2D and 3D objects usingthe vision tools provided by the VisionPro softwarepackage and the programming language LabVIEW; imple-ment the computer vision system to the KUKA robot andestablish a communication between the vision system andthe KUKA robot so that the robot can recognise differentobjects and respond to them appropriately in a manufac-turing environment.

The overall system structure is illustrated in Fig. 1. Itconsists of a number of hardware units, including the visioncomputer, the video camera, the KUKA robot controllerand the KUKA robot. Each of these carries out a unique setof inter-related functions.

The vision system on the vision computer is integratedwith a number of hardware and software components. Thehardware are used to provide physical connection betweenthe different parts of the system, collect useful informationfor the object recognition process and perform necessarymovements to facilitate the process. The software is used tocarry out image processing tasks, enable communicationbetween modules, make decisions for the system andcontrol the hardware. There are several components installedon this computer vision system. These include the VisionProsoftware package, a video acquisition card, and a digitalinput/output (I/O) card. The VisionPro software packageoffers all the necessary tools and functions for image

processing. The QuickStart and VisionServer are primarilyused to build the vision tasks for object recognition. Theyare used to perform the image processing tasks only andare executed on the vision computer to share the workloadof the KUKA robot controller.

The video acquisition card is used to control the camerato capture images. The card is a Cognex MVS-8500 seriesframe grabber, which is supported by the VisionProsoftware package. When the image capture function iscalled, a signal is sent to the card, which triggers the camerato take a photo and send it back to the vision software foranalysis. This card is able to support up to four high speedprogressive scan cameras.

The communication between the vision computer andthe KUKA robot controller is achieved through either theobject linking and embedding (OLE) for process control(OPC) or the transmission control protocol and Internetprotocol (TCP/IP). The LabVIEW program is used to controlthis process on the vision computer side. It also acts as thecontrol centre, which determines the next operation of thesystem, displays the status of the object recognition processand outputs results to users.

The object recognition process undergoes five mainstages: (1) camera calibration; (2) image acquisition; (3)image and information processing; (4) object recognition;and (5) results output. Two-dimensional objects are easy tohandle since only the top views need to be considered and thecamera can be placed directly above the objects. 3D objectson the other hand are more difficult to deal with, since theseobjects contain several faces and many different sittingpositions. The appearance of 3D objects can vary greatlywhen viewed from different camera poses. Recognising 3Dobjects is hard to achieve when only a single view isavailable, either because there is not sufficient featureinformation to separate one object from another similar

KUKA RobotController

Vision Computer

CameraVisionPro's

VisionServer

LabVIEWprogram

Main controlprogram

KUKA robot

Fig. 1 The overall system structure (arrows indicate communications)The LabVIEW program is the core of the system and communicateswith VisionPro’s VisionServer and the motion control program on theKUKA robot controller, which in turn communicates with the cameraand KUKA robot, respectively


object in the database, or corruption of image exists due toclutter and occlusion. Furthermore, images taken by acamera are 2D only. In order to tackle the 3D problem, it isnecessary to take images of an object from different view-points and combine the information together.

3.1 System setup

In order to allow the KUKA robot to take images ofobjects, the camera is fixed on the end-effector of theKUKA robot as shown in Fig. 2. The KUKA robot is avery flexible device which has six degree of freedoms(DOF). Its arm can move around the object and allowimages to be taken in a large number of camera poses. Byinstalling the camera on the end-effector of the KUKArobot, it is possible to make full use of the robot byallowing both the camera and any other necessary tools tobe installed concurrently. This has the advantage that bothactive and passive vision systems can be used. Since therobot can always be programmed to go back to the sameposition, it can also be treated as a fixed system. Thedisadvantage of placing the camera on the robot’s end-effector is that vibration of the camera might affect thequality of the images taken. This problem is solved byinstalling a shock absorber in between the end-effector ofthe robot and the camera.

The camera used is a progressive scan CCD mono-chrome camera. Figure 3 shows an image of the camerainstalled on the robot. A monochrome camera was chosento reduce the size of the image files. In the objectrecognition process, many images of the objects taken fromdifferent viewpoints need to be pre-obtained and stored inthe database so that they can be used to compare with therun-time images of the objects. Bitmap (BMP) and imagedatabase (IDB) are the only two image formats supportedby VisionPro. Images stored in these formats are large insize since they cannot be compressed. Therefore, in order toconserve both the hard disk and memory usage of the

vision computer, grey-scale images were used in the imageprocessing stage in place of full red, green, and blue (RGB)images. This also has the advantage that the vision taskscan be performed faster, since grey-scale images are onethird smaller than RGB images, assuming the same numberof bits is used for each colour channel.

One issue with the CCD camera used in this research isthat because the focal length cannot be adjusted automat-ically, any adjustments need to be made manually. Thismakes it impossible to adjust the focal length in run-time.Since the quality of the images is affected by whether thecamera is focused or not, it was necessary to find out thetolerable range of the camera for different focal lengths.Experiments were conducted to determine this, whichinvolved moving an object forward and backward from apre-set focal point until the camera could not see thepatterns on the object clearly. The maximum forward andbackward distances were recorded for each focal length. Itwas found that the tolerable range increased proportionallywith the increase in focal length. These results areimportant for determining what distance the camera shouldbe placed away from the objects for taking images, as willbe discussed shortly.

3.2 KUKA robot

The KUKA robot has six DOFs and a maximum payload of15 kg at the nominal distances of 150 mm and 120 mm inthe z and y directions, respectively [40]. In this system, theprocessing time is one of the key factors to determining theperformance. If the robot moves too slowly, it will increasethe overall time. However if the robot moves too fastvibrations will affect the quality of the images taken.Therefore, the velocity and acceleration of the KUKA robothave to be carefully considered.

KUKA Robot

Object

Camera

Fig. 2 Schematic diagram showing system setup, which consists ofthe KUKA robot, the camera fixed on the end-effector of the robot,and the object to be recognised

Fig. 3 Photo of the camera fixed on the end-effector of the KUKArobot


In many object recognition systems, it is important tokeep track of the camera poses for taking images. Anappropriate viewpoint planning algorithm is required todecide where the camera viewpoint should be changed toget the next best view of an object, if the first image doesnot provide sufficient information to identify what theobject is. This can prevent the camera from taking imagesfrom an inadequate viewpoint. In this research, the KUKArobot was only allowed to go to one of three pre-definedviewpoints to capture images. Three fixed viewpoints exist,one on top and two on the sides of the object. These fixedviewpoints were around 800-900 mm away from the objectto allow the camera to focus on the object. They areperpendicular to each other and allow a maximum objectsize of approximately 0.12×0.15×0.15 m (length×width×height) to be recognised. This size was restricted by thefocal length of the camera as discussed previously.Figures 4, 5 and 6 show the three positions.

4 System design issues

There are a number of software packages involved in thisresearch. The main roles of the software are controlling theoperations of the whole system, enabling communicationbetween different parts and carrying out object recognitiontasks. The software that was used to facilitate theconstruction of the system include the VisionPro softwarepackage, LabVIEW Programming Language, and theKUKA Robot Programming Language.

VisionPro is a very comprehensive software package formachine vision applications. It contains a collection ofMicrosoft’s Component Object Model (COM) componentsand ActiveX controls that allows the development of visionapplications in Visual Basic and Visual C++ programmingenvironments with great flexibility. A wide range of toolsand functions are provided to facilitate the different stages ofan object recognition process, such as calibration, image ac-quisition, image training and analysis, geometric patternmatching and inspection and perspective distortion correction.

There are two components offered by the VisionProsoftware package which allow fast construction of visionapplications without the need of programming in eitherVisual Basic or Visual C++. These are the QuickStart andthe VisionServer applications and have been used exten-sively to build the vision tasks for object recognition in thisresearch.

The QuickStart application offers an user friendlygraphical environment that allows the construction of visiontasks in a fast and simple manner. It contains a tool groupedit control window, which provides the place for con-structing and editing vision tasks. A vision task can becreated by dragging and linking the necessary vision toolsfrom the list in the tool box on the edit control window. Allthe vision tasks created in QuickStart are saved in theVisionPro Persistence (VPP) format.

Fig. 4 Photo of position 1 (top view)

Fig. 5 Photo of position 2 (front side view)

Fig. 6 Photo of position 3 (right side view)


The VisionServer is another useful component providedby VisionPro that allows vision tasks to communicate withother applications on the same or external computers. Sinceno communication tool is provided in QuickStart, Vision-Server must be used to transfer results generated from thevision tasks to other parts of the system. There are twocommunication methods supported by VisionServer, namelyOPC and TCP/IP.

4.1 Communication between robot controllerand VisionServer

Communication between the KUKA robot controller and thevision computer is a major part in this work. Data need to flowbetween the systems so that the robot can move to the desiredpositions and allow the camera to take images at the rightviewpoints for analysis. The robot controller and the visiontasks built on the QuickStart component of VisionPro are twoseparate modules in the system and cannot communicate witheach other directly. To overcome this, communication isachieved by the use of VisionServer.

Choosing an appropriate communication method was animportant task because it affects the speed of data transfer,the complexity of software structure/arrangement, futureexpandability, reliability, and performance of the system.Two communication methods were tested in the system.These are discussed in detail below.

4.1.1 OPC communication method

OPC utilises Microsoft’s COM and Distributed COM(DCOM) technologies to enable applications to exchangedata on one or more computers using the client-serverarchitecture [41]. OPC runs on top of the TCP/IP protocoland allows multiple clients to connect to one server,forming a communication network. This is an advancedmethod of data transfer and is commonly employed inrobotics applications [42]. The KUKA robot controllercontains settings which enable any motion control programto retrieve information from the OPC server in the KUKArobot controller directly. The OPC communication methodfor this system is illustrated in Fig. 7.

Using the OPC communication method, the KUKArobot controller is treated as the server. A network card isneeded to setup the connection. The OPC server on therobot controller contains a list of variables which can beread and/or written by clients connected to it. When theKUKA robot is ready to start, it sends a command to triggerthe VisionServer to take an image of the object and performa vision task. Firstly, the motion control program writes tothe command variable on the OPC server. The VisionServer,which is configured as an OPC client, then reads the newcommand and performs the task. After the vision task isfinished, the VisionServer transmits the results back to therobot controller by writing the results in the other variableson the OPC server. These data are then available for pick upand analysis by the motion control program. The motioncontrol program then decides whether it has obtained enoughinformation to identify the object or not. If there is enoughinformation, the system outputs the result to the users on thescreen, otherwise, it moves the robot to another position totake an image and repeats the process.

This method is the most straightforward method. It issimple to setup, reliable, fast in data transfer, and involvesthe least number of software. It allows other applications orsystems to be added in the future very easily because OPCis a standardised communication method of process controldata in the industry [42].

4.1.2 TCP communication method

TCP/IP is a common method used to transmit data overnetworks. IP packages data into components called data-grams. A datagram contains the data and a header thatindicates the source and destination addresses. In atransmission, IP simply determines the correct path for thedatagram to take across the network and sends the data tothe specified destination. However, it cannot guaranteedelivery. In fact, IP might lose the datagram or deliver asingle datagram more than once if the datagram is duplicatedin transmission. Thus, it is rarely used in automationprograms. TCP is similar to IP but is more reliable. Itretransmits the datagram until it receives an acknowledgementfrom the receiver. TCP ensures reliable transmission bydelivering data in sequence without errors, loss, or duplication[41, 43]. For these reasons, only TCP is considered in thisresearch. Figure 8 shows how TCP can be used to connectthe two computers together in the system.

This method is similar to the OPC method above, exceptthat the VisionServer is set as a TCP server and there is oneadditional LabVIEW application involved. The Vision-Server transfers data to and from the KUKA side usingthe TCP protocol. However, the KUKA controller cannotpick this up without a TCP client. The LabVIEWapplication acts as a TCP client to receive the data from

Camera

Vision Computer

VisionServer- OPC client


OPC server

Main controlprogram

KUKA robot

OPCProtocol

OPC Protocol

Fig. 7 The OPC communication approach for this system


the sender. An OPC client is also included in thisapplication program to transfer the data received to theOPC server using the OPC protocol. This process is neededbecause the motion control program can only read datafrom the OPC server. Signals from the motion controlprogram can also be sent back to the vision computer usingthe same path in the reverse direction. This protocol is usedin this research to ensure data are correctly transferredbetween images.

4.2 Structure of the vision application

The vision application is used to perform image processingin the system. Nine vision tasks are combined togetherinside the VisionServer to form this application. It canrecognise three 2D objects and five or less 3D objectsdepending on the number of sitting bases allowed for eachof the objects. Images of the objects to be recognised by thesystem have to be pre-acquired, trained and stored insidethe vision tasks for matching in run-time. Each of the ninevision tasks in the VisionServer is given an index number,ranging from 0 to 8. They all have similar structure and willonly be executed when there are commands received fromthe LabVIEW program, for example ‘0 run’ will executevision task 0.

Vision task 0 is used to store the top views of all themodels to be recognised, that is, the face of the objects seenby the camera from position 1. This task is always executedat the start of a new object recognition process to checkwhich model the object being recognised looks like. Theother eight vision tasks are used to store the side views ofthe models, that is, the views which can be seen by thecamera at positions 2 and 3. Side views from the samemodel are grouped together inside one vision task. Forexample, if the top view of an object is stored as the fifth

image in vision task 0, then all its side views will be storedin vision task 5. In each vision task, up to eight side viewsfor each model can be stored. The more side views thevision tasks store, the lower failure rate the system willhave. However by having more views stored in the visiontasks, the processing time is increased. These eight visiontasks are only used to analyse the side views of the objectbeing processed and not all of them are executed in everyobject recognition process. Only the tasks whosecorresponding top views look like the top view of theobject being processed are executed.

In each of these vision tasks, there is an AcqFifo tool, anImageFile tool, and a number of PMAlign tools dependingon the number of images needs to be stored. The AcqFifotool is used to acquire images from the video camerathrough the frame grabber card. It also provides otherfunctions for configuring the frame grabber card, displayingthe live image of the camera, viewing the last image thecamera took, and setting the lighting parameters.

The ImageFile tool is used to create image files, saveand load images from new or existing image files. It storesimages for each of the vision tasks and acts as an imagedatabase. However, images loaded by the ImageFile toolwill not be used to match against the object beingrecognised in run-time directly. Instead, only the imageswhich are trained and loaded by the PMAlign tools areused. In other words, this tool is not needed in run-timeafter images have been trained. However, it is still includedin each of the vision tasks so that users can choose to useother images to recognise objects later on.

The PMAlign tool is a powerful tool and is the tool usedto carry out the image processing and object recognitionstages. A whole image or a specific region in an image hasto be trained and supplied to the tool before it can operate.The tool simply searches and matches the trained region orpattern in the images taken during run-time. Differentsearching algorithms can also be selected. In this work, thePatMax algorithm used in VisionPro is employed due to thefollowing characteristics [44]:

– It allows trained objects or patterns to be searched andmatched with run-time images with high accuracy andhigh speed even when the images are rotated, scaled,and/or stretched.

– It recognises objects based on their overall shapes andfeatures rather than the grey-scale intensity values. Thismeans that it is partially invariant to different lightingconditions and has a high resistance to cluster.

– When the algorithm finds the trained object in a run-time image, it is able to return the object’s true locationand orientation. Furthermore, it is also capable ofcomputing a score between 0.0 and 1.0 to indicate howclosely the view of the object in run-time matches the

Camera

Vision Computer

VisionServer- TCP server


LabVIEWprogram -TCP/OPC

client

Main controlprogram

KUKA robot

TCPProtocol

TCP Protocol

OPC server

TCP Protocol

Fig. 8 The TCP communication approach for this system. This is theprotocol adopted in the system, note the extra communication layerscompared with the OPC communication approach as shown in Fig. 7


trained views after accounting for the transformationthat the object has undergone. These results aredisplayed on the result tab of the PMAlign tool and areoutputted to the LabVIEW program through the TCPcommunication function provided in VisionServer.

The PatMax algorithm used in PMAlign tool is shown inFig. 9. An overview of the process is presented below,which discusses the recognition process as a whole. Thefeature detection stage detects point features of the object,where for each point the position and orientation informa-tion are also stored. The features detected are often ofdifferent sizes, this is an important factor in the objectrecognition process implemented and the reasons for thiswill be discussed shortly. After the features have beendetected, the training image then undergoes a training stage.This stage uses one of the two following as input: (1)training image and (2) shape model. Shape models aregeometric description of the run-time image and can beconstructed from geometric primitives provided by theVisionPro software package or computer-aided design(CAD) models. Shape models are not used in this systemsince all the training is done through the use of trainingimages. Given the training image, the training algorithmconstructs an internal geometric representation of thefeatures identified from the feature detection stage. Theoutput from this is a trained pattern which is a geometricdescription of the model. Advantages of using geometric

description of the model instead of directly comparing theimages have been discussed in § 2.

The last stage in the object recognition process is tosearch for a model stored in the database, given a run-timeimage. In the search stage, the algorithm first identifies thelarge features, and uses these to locate the object in the run-time image. The small features are then used to determinethe precise location of the object. This approach allows afast matching of the run-time image with the trainingimages. In addition to the features listed above, thePMAlign tool also contains settings to limit the amount ofrotation and transformation that the tool should consider ona view of an object. By defining constraints on thealgorithm, the tool will be able to process much faster,since there is less information which need to be considered.The disadvantage of defining a constraint is that it will limitthe possible pose of the objects. Table 1 shows thePMAlign tool settings for the different viewpoints tominimise the time required, while maximising the accuracyof the system and avoid placing any limitations on the poseof objects for recognition.

In vision task 0, there is one extra tool called theCalibCheckerboard tool. This tool calibrates the camera sothat the camera coordinates can be mapped to the user-defined, real world coordinate system. The calibration isdone by placing a checkerboard under the camera andsupplying the size of each square on the board to the tool.The tool will then compute the transformation between the

Trainingimage

Featuredetection

Training

Shape model

Featuredetection

Run-timeimage

Trainedpattern

SearchPosition /

orientation /size / scores

Fig. 9 The PatMax image rec-ognition algorithm (reproducedfrom [44])

Table 1 Settings of the PMAlign tools in the vision tasks for the different viewpoints

Rotation Scale

Image type Max. Nominal Min. Max. Nominal Min.Top views 180 0 −180 1.2 1 0.8Side views 45 0 −45 1.2 1 0.8

x-transformation y-transformationImage type Max. Nominal Min. Max. Nominal Min.Top views 0 0 0 0 0 0Side views 1.2 1 0.8 0 0 0


image coordinate with the world coordinate by mapping theintersections (points) on the checkerboard to the 2Dposition of the points in the image. The transformation isdefined by [45]:

x ¼ K Rjt½ �X ð1Þ

The expanded form of the equation is shown by:

xy1

24

35 ¼

fx 0 cx0 fy cy0 0 1

24

35

r11 r12 r13 t1r21 r22 r23 t2r31 r32 r33 t3

24

35

XYZ1

2664

3775 ð2Þ

where x is the 2D image coordinate, K the cameraparameters, R the 3×3 rotation matrix, t the translationvector and X the 3D world coordinate. Note that since thecheckerboard used is a 2D object, the values for the zdirection can be any constant value and is automatically setby the CalibCheckerboard tool in VisionPro. A least squareapproach is used to compute the translation vector androtation matrix since the equation is over-determined. Thisis necessary as noise exist in the images and an exactsolution can not be computed, instead, an approximatesolution is obtained by solving the over-determinedequation. The CalibCheckerboard tool is used to enablethe PMAlign tool to return to the pose of the objectrecognised in real world dimensions, so that this informa-tion can be used for robot guidance in later stages of amanufacturing process. The camera was only calibrated atposition 1 since only the rotation, x and y coordinates wereof interest. Once the calibration process is done, the camerashould not be moved again, otherwise, it will introduceinaccuracies. It was not a problem in this system since theKUKA robot has a repeatability of ±0.1 mm. This amountof error did not affect the accuracy of the calibrationprocess.

Every time a command is received to execute a visiontool, the AcqFifo will take an image of the object and feedit to each of the PMAlign tools. Each of these will then tryto fit and match its trained image on the image receivedwithin the amount of orientation and transformationspecified. When any of the PMAlign tools finds an objector pattern on the image that is close to its trained image, itwill generate a score and the pose of the object. Theseresults will then be transmitted to the LabVIEW programthrough the TCP protocol for analysis. Sometimes, a visiontool will find that the view of the object being recognisedlooks similar to more than one trained image and producemore than one set of results. Therefore, the results areindexed to avoid confusion, for example, score1 and score2in the case two possible solutions exist.

4.3 LabVIEW and KUKA robot programming

There were two programming tools used to assist in theconstruction of the object recognition system. These werethe LabVIEW and KUKA robot programming languages.LabVIEW was used extensively to construct the mainprogram to control and make decisions for the wholesystem, whereas the KUKA robot programming languagewas used to build the motion control program to control therobot.

LabVIEW is a high level programming language whichis widely used in industry for signal acquisition, datamanipulation and transmission, measurement analysis, andautomation control. It was chosen as the programming toolto construct the main program in this research because itoffered many pre-defined functions for communication, inparticular the TCP and OPC protocols. During theconstruction of the main program, a number of sub-virtualinstruments (subVIs) were created to simplify and modulisethe program.

In order to have the robot running continuously andautomatically according to the signals received from thevision computer, it is necessary to program it using theKUKA robot programming language. A motion controlprogram was written for this purpose. The KUKA robotcontroller offers two programming modes, namely the usermode and expert mode. User mode provides all the standardcommands to control the robot, whereas the expert mode isa higher level of programming, which allows the program-mer to have full access to all the features that would behidden in user mode. The motion control program waswritten under the expert mode as these additional featureswere required to fully control the robot.

5 Object recognition algorithms

The LabVIEW program is used to make decisions for thenext step in the object recognition process and commanddifferent parts of the system to function. The algorithm forobject recognition is one of the most important part of theprogram.

5.1 Algorithm for 2D object recognition

Recognising 2D objects is a relatively simple task, sinceonly the top view of the objects need to be considered. Onesimple algorithm for 2D object recognition is shown inFig. 10. Firstly, the LabVIEW program commands theVisionServer to take an image of the object from the topand match it against the images of all the models in thedatabase. The VisionServer will then compute a score foreach model to indicate how close each model matches with


the object, with 1.0 being the highest and 0.0 being thelowest. After that, the LabVIEW program can identify whatthe object is by comparing the scores of the various models.One simple method is to compare the scores with athreshold value. The model is reported as the object onlyif the score is above the threshold value.

Sometimes there will be more than one model getting ascore that is bigger than the threshold value due to thesimilarity of the models in the database. In this case, themodel with the highest score will be reported as the object.On the other hand, if none of the scores is above thethreshold value, the program will report an error. After theobject is recognised, the next 2D object can be processedusing the same procedure. Since this algorithm is simpleand only one image needs to be checked against othertrained images in the system at each time, problems such aslong computational time and view planning which arecommon in 3D object recognition systems do not exist.

5.2 Algorithm for 3D object recognition

Three-dimensional object recognition is much more complexcompared to 2D cases. This is because a 3D object can havemany different faces and sitting bases. The appearance of theobject can be very different from one viewpoint to another.Moreover, an object recognition system often needs toidentify more than one object. If the camera is only fixed inone place to take images, it may not have enoughinformation to identify an object if it looks similar to morethan one model in the database. It may also report the wrongmodel name when cluster or occlusion exists. Therefore,taking multiple images from different angles is a commonapproach for 3D object recognition systems.

Consider the case where there are several 3D objects tobe recognised and there is no restriction on the number ofsitting bases each object can have. One approach to solvingthis problem is to obtain and train several views of eachobject that can be seen by the camera and store the results

them in the system. In run-time, if any one of the images inthe database looks similar to the object being recognised,the system will report the name of model that this imagebelongs to. Otherwise, it will go to another viewpoint totake a new image and repeat the same process until it findsa trained image that looks similar to the new view of therun-time object. Sometimes, there may be two or moreimages from different models which appear similar to theview of the object due to the similarity of the models. Inthis case, the system will need to take another image from anew position and repeat the process again until there is onlyone image in the database which matches the run-timeimage. Figure 11 shows the flowchart of this algorithm.

This algorithm is easy to implement but it has manydrawbacks. One of the major problems is the longcomputational time needed. If there is only one model inthe database, this algorithm can perform reasonably fast;however, this is not a practical setup. If there are moremodels to be recognised or if the models have more complexshapes and more faces, there will be a lot more trainedimages in the database. For example, if there are n models tobe identified and each has m faces, there will then be m×ntrained images in the database of the system. Since there isno restriction on the number of sitting bases each object canhave, the system has no idea which view of the object will

Take an image ofthe object from the

top view

Determine whatthe object is by comp-aring the image of the

object with the database. Theselection is based on the

scores from the visionapplication

Output the name ofthe model

Fig. 10 Two-dimensional ob-ject recognition algorithm

Obtain and train anumber of views for

each model

Move the KUKArobot to the first

position

Take an image ofthe object

Compare the imagewith all the images

in the database

Check the numberof models which have

images that appear similar tothe image of the object being

recognised by comparingthe scores withthe threshold

Move the KUKArobot to another

position


Are all theimages from the

same model

>1

01

Yes

No

Fig. 11 Simple 3D object recognition algorithm


face the camera. As a result, every time an image is taken,the system needs to compare the image with all the trainedimages in the database. This process is extremely timeconsuming and occupies a lot of computer resource. Thisproblem is magnified when there are similar models in thedatabase or when the system fails to find a trained imagewhich looks similar to the run-time image because, asshown in the flowchart in Fig. 11, the system will simplyrepeat the same process until it finds one model thatmatches. Another problem in this algorithm is if an objectthat has not been trained is put in the system forrecognising, then the system will simply go into an endlessloop and never finish since it can never find a model thatmatches the image of the new object. These disadvantagesmake it impractical to be used in real life applications andtherefore, an improved method is derived.

5.3 Improved algorithm for 3D object recognition

To improve the above algorithm, one method which is usedis to restrict all the 3D objects to one sitting base only. Byplacing this restriction, any 3D object will always have thesame face showing up no matter what pose it is placedunder the camera. This means that every time a new objectrecognition process starts, the robot can always go to thetop (position 1) to take an image of the object and compareit with the top views of all other models in the database.After the images have been compared, the system will havean idea of which model the object being processed is likelyto be by comparing the scores. It outputs what object it isbased on the top view. If the system is uncertain what theobject is, it goes to the side positions (either position 2 or 3)to check for the object’s side views to further verifywhether or not it is the one that the system initiallydetermines before the system outputs the result. Byallowing the system to output the result based on the topview can make the process performs faster; however; it also

decreases the accuracy. On the other hand, if the systemchecks for at least two views of the object before it outputsthe result, the system will perform slower, but will have amuch higher accuracy. There is a trade-off between speedand accuracy and this decision should be made based on thesituation. In this research, it was chosen to check for twoviews in for 3D object recognition to enhance the accuracyof the system.

Using the proposed approach, if the run-time objectlooks similar to two or more models from the top, theKUKA robot will analyse the side view of the run-timeobject from position 2 by checking it against all the sideviews of all the possible models. If the system can identifywhich model the object is, it will stop and output the result,otherwise, it will go to position 3 and repeat the sameprocess. If the system still cannot determine what thisobject is, it will output a failure message and stop the objectrecognition process at this point. This is to prevent thesystem from running for a long period of time to handlecomplex or new objects.

This improved algorithm is fast and accurate comparedwith the simple 3D object recognition algorithm describedin Fig. 11. For every object recognition process, only asmall portion of the trained images in the database needs tobe looked at. This minimises the time required and theworkload of the vision computer. In addition, when there isnot sufficient information to identify the object due to thepresence of similar models in the database, the system willautomatically go to a new position to take another image ofthe object and perform the analysis again. If the system stillcannot identify the object after three runs, it will stop andreport a failure. This algorithm can also be used in caseswhere objects have two or more sitting bases. This isachieved by treating the object sitting on the other base as aseparate object. The top view and side views of this newobject can be obtained as shown in Figs. 12 and 13.

Fig. 12 Nokia 8310 mobile phone on the original sitting base

Fig. 13 Nokia 8310 mobile phone on the new sitting base


This improved algorithm for 3D object recognition is thealgorithm developed in the system. The flowchart of thisalgorithm is shown in Fig. 14. This is combined with the2D object recognition, forming the LabVIEW program usedfor the object recognition system. This program is capableof determining whether the run-time object is 2D or 3D bylooking at the index number of the matched top view in thevision application. In the case study for this research, if theindex number is one, two or three, the object is treated as a2D object, otherwise it is a 3D object. This means that users

trained and saved images of 2D objects in the first threePMAlign tools in vision task 0, while the remainder of thePMAlign tools in vision task 0 are used for 3D objects forthis case study.

5.4 User interface of the LabVIEW program

The user interface of the LabVIEW program displays usefulinformation and allows users to control the operation of thesystem easily. Figure 15 shows the LabVIEW program

Move the KUKArobot to the first

position


Compare the imagewith all the top

views of all models




Report failure

Move the KUKArobot to the second

position


Compare the imagewith the side views of

the possible model

0

1

The side viewsof the possible model

appear similar to the image ofthe object being

recognised


Yes

Move the KUKArobot to the third

position


Compare the imagewith the side views ofthe possible model

The side viewsof the possible model

appear similar to the image ofthe object being

recognised

No

Move the KUKArobot to the second

position


Compare the imagewith the side views ofall possible models


Move the KUKArobot to the third

position


Compare the imagewith the side views ofall possible models

>1

Report failure

Yes

No




1

>1




Report failure

0, >11

Fig. 14 Adopted, improved 3D object recognition algorithm which improves the performance of the system when dealing with complexedobjects


developed in this research. The ‘Machine’ and ‘Port’ inputfields are required to configure the TCP communicationwith the VisionServer. This information must be suppliedso that the TCP client of the vision computer knows whichTCP server it should connect to. If connection to theVisionServer fails, the program will stop automatically andreport an error.

The ‘Threshold Input’ field allows users to input andchange the threshold value used for object recognition inthis system. In certain situations, there are many similarobjects to be recognised and it is necessary to have a higherthreshold value to distinguish between the objects. However,a high threshold value is not always preferred, since it cannottolerate slight changes in objects appearances. It is difficultto determine a fixed threshold value for all these differentsituations, and therefore, it is made adjustable so that it canbe used in a wider range of applications. Experimentalresults have shown that a threshold value of around 0.5–0.6would be suitable for most cases.

The ‘TCP Sent’ and ‘TCP Received’ display boxes areused to show the last set of communication between theVisionServer and the LabVIEW program. The ‘History’display box is used to show all the past communicationsbetween the main program and the VisionServer. It allowsusers to check for the past operations of the system and thescores generated by the VisionServer for different models ina process to verify the result. The ‘Message’ display box isused to show the current status of the system and enablesusers to observe the current operation of the system. It canalso prompt users to put in a new object and click the‘START’ button to start a new object recognition processwhen the system is ready.

The ‘Result’ display box displays the result after anobject recognition process is completed. If the system cansuccessfully recognise an object, it will output the objectname along with its x, y coordinates and orientation relativeto the centre of the camera, i.e. the calibrated point andaxes. Otherwise, it will report a failure and try to explain

the possible reasons behind the failure. The ‘ProcessingTime’ display box shows the total time taken for an objectrecognition process in seconds. The system starts timingonce the ‘START’ button is clicked and stops when theresult is displayed. It is useful to determine the performanceof the system. The ‘START’ Button allows users to start anobject recognition process at any time when it is ready, andthe ‘STOP’ Button allows users to stop the programrunning at any time.

6 Performance of the system

After the system was successfully developed and imple-mented on the KUKA robot, two sets of experiments werecarried out to verify the performance of the system. Severaldifferent 2D and 3D real-life objects were selected to usefor the experiments. Some of the selected objects weresimilar to others in appearance. The purpose of this was toverify the system’s ability to handle similar objects.Figure 16 shows an image of the objects, including 2Dobjects such as two car keys, a locker key, and 3D objectssuch as a yellow cellotape, a white cellotape, a Nokia 6100mobile phone, a Nokia 8310 mobile phone with back cover,and a Nokia 8310 mobile phone without back cover.

In the experiments conducted, the performance of thesystem was rated according to the image processing time,success rate, and accuracy in recognition. The success rateand accuracy are defined in Equations (3) and (4),respectively.

success rate ¼ number of correct recognitions

number of experimentsð3Þ

Fig. 15 Screenshot of the user interface of the LabVIEW program

Fig. 16 The seven objects used in the experiments. The top view ofthe objects is shown


accuracy ¼ number of actual correct recognitions

number of claimed correct recognitionsð4Þ

6.1 General performance of the system

The aim of this experiment was to find out the performanceof the system with different 2D and 3D objects in astandard indoor environment. All the objects have only onesitting base. The same set of trained images was usedthroughout the experiment. Other factors which could alsoaffect the performance of the system were kept constantthroughout the whole experiment, these are the lightingconditions, the number of trained images in the Vision-Server for each object and the threshold value. Thisexperiment was conducted by the following steps:

1. Take one object and place it at a randomly chosen posewithin the field of view (FoV) of the camera.

2. Execute the program to perform the object recognitionprocess.

3. Record the results and measure the actual pose of theobject.

4. Repeat steps 1–3 ten times.5. Repeat steps 1–4 for all eight objects.

Table 2 summarises the results for this experiment. It isshown that the object recognition system has very highperformance.

6.1.1 Two-dimensional objects

In the experiment for 2D objects, the system was able torecognise all of the objects within six seconds. Both thesuccess rate and the accuracy were 100%, even though twoof the objects were highly similar in appearance (car key 1and car key 2). The pose of the 2D objects had no affect onthe system’s performance.

6.1.2 Simple 3D objects

In the experiment for simple 3D objects (yellow and whitecellotapes), the system was able to recognise the objects inapproximately 12 seconds. Both the success rate andaccuracy were 100%. Two camera poses were used eachtime (one top view and one side view) for all objects in allruns, except for the second run of the yellow cellotape. In thisrun, images were taken from three different camera poses andit took approximately 19 seconds to recognise the object. Thiswas due to the misplacement of the object in the experiment. Ifthe objects were placed within the FoV of the camera, thesystem would be able to recognise any object with only twoimages, regardless of the pose of the objects.

6.1.3 Complicated 3D object

In the testing of a more complicated 3D object (Nokia 6100mobile phone), more time was required by the system torecognise the objects compared to the two simple 3Dobjects above, with an average of 14.45 seconds. This wasdue to the object having more faces and more trainedimages were stored in the database. When the system wentto position 2 to check for the object’s side views, it wasrequired to compare the run-time image with more trainedimages and therefore took a longer period of time tocompute the result.

6.1.4 Complicated and highly similar 3D objects

For the two complicated and similar 3D objects, the Nokia8310 mobile phone with and without back cover, it wasfound that these objects took the longest time for the systemto process, with an average of approximately 21.6 seconds.The range of process times for the objects was big, due tothe orientation of the objects during the experiment. Thesetwo objects had the same top views, one pair of identical

Table 2 Performance of the system in a clustered scene

Object Processing time (s) Success rate(%)

Accuracy in recognition(%)

Number of trained images in thesystem

Mean Max. Min.

Car key 1 5.12 6.95 4.01 100 100 1 (top)Car key 2 5.15 5.44 4.86 100 100 1 (top)Locker key 4.89 5.05 4.79 100 100 1 (top)Yellow cellotape 12.39 13.33 11.39 100 100 1 (top), 2 (side)White cellotape 11.91 13.31 10.90 100 100 1 (top), 2 (side)Nokia 6100 14.45 16.32 12.37 100 100 1 (top), 8 (side)Nokia 8310 with back cover 21.64 25.69 14.92 90 100 1 (top), 8 (side)Nokia 8310 without backcover

21.54 26.01 14.36 80 89.9 1 (top), 8 (side)

The processing time, success rate and accuracy are obtained from ten recognition processes for each object.


side views, and three pairs of similar side views. In someorientations, the objects appeared very similar to each other.This caused the system not being able to distinguishbetween the objects and three camera poses were requiredto take images of the objects. From the experiment, it wasalso found that when these two objects were placed at anorientation ranging from 0-π radians, the system was ableto identify the objects with only two images and the processtime was much shorter at approximately 15 seconds. Whenthe objects were placed at an orientation outside of therange 0-π radians, the system needed three images fromdifferent camera poses and a time of approximately 25seconds. This shows that the process time required could becontrolled by placing the objects in different orientations.Figure 17 shows the coordinate system defined for theobjects.

It was also found that the system had a lower successrate and accuracy for these two objects, compared to theother objects shown in Fig. 16. The system was able torecognise the object with the back cover 90% of the timewith an accuracy of 100%. The success rate and accuracydropped to 80% and 88.9%, respectively, for the objectwithout the back cover. The system fails when the objectswere placed at an orientation of approximately two radians.Possible causes for this include the shadow of the object,created by the object itself, or the presence of other objectsin the background.

6.2 Performance of the system with less trained images

The aim of this experiment was to find out the whether theperformance of the system would be affected by the numberof trained images in the database. The same settings as inthe previous experiment were used, with the exception thatthe number of side view images stored for each object wasreduced. This experiment also only focused on the morecomplicated 3D objects, that is, the Nokia 6100 mobilephone, Nokia 8310 mobile phone with back cover andNokia 8310 mobile phone without back cover.

The method for this experiment was the same as in theprevious experiment, except that the number of trained sideview images for each object was reduced from eight to four.

Table 3 summarises the results for this experiment. It isshown that with less trained images in the database, thesystem performs faster by two to four seconds, as fewercomparisons were made. The downside to using less trainedimages is that the KUKA robot was forced to take imagesof the objects from all three poses more frequently, as lessinformation is available and the system was less certainwith the initial guess based on the top view of the run-timeobject. This also caused a drop in the success rate andaccuracy rate compared to the previous experiment.

7 Conclusions

This paper presented a computer vision system that has beensuccessfully developed to recognise general 2D and 3Dobjects using a CCD camera and the VisionPro softwarepackage. A reliable communication between the computervision system and the KUKA robot controller has beenestablished using the TCP protocol. It allows the visionsystem to control the KUKA robot to take images of an objectfrom different poses and perform the object recognitionprocess. The vision system is efficiently integrated with theKUKA robot. A vision application has been constructed byintegrating the suitable vision tools inside QuickStart andVisionServer. Object recognition algorithms are developed toimprove the performance of the system, and an user interface

Fig. 17 The coordinate system defined for the Nokia 8310 mobilephone with and without back cover

Table 3 Performance of the system using less trained images compared to the first experiment

Object Processing time (s) Success rate(%)

Accuracy in recognition(%)

Number of trained images in thesystem

Mean Max. Min.

Nokia 6100 13.89 21.07 11.01 90 100 1 (top), 4 (side)Nokia 8310 with back cover 19.63 22.71 12.03 70 100 1 (top), 4 (side)Nokia 8310 without backcover

19.87 22.54 13.48 70 100 1 (top), 4 (side)

The processing time, success rate and accuracy are obtained from ten recognition processes for each object.


has been successfully developed to allow easy operation byusers. Intensive testing was performed to verify the perfor-mance of the system. It was able to recognise any general 2Dobject within a time of six seconds. The success rate andaccuracy for 2D objects were both measured to be 100%. Theperformance of the system in 3D object recognition wasaffected by the number of trained images stored in thedatabase, the complexity of the object to be recognised, andthe presence of similar objects in the database. Overall, thesystem was able to identify any simple 3D object inapproximately 12 seconds and complex objects in approxi-mately 14 seconds. For similar and complicated 3D objects, itwould take the system around 20 seconds to identify them.The overall accuracy and success rate of the systemwere closeto 100% for 3D objects.

Future work includes a number of improvements whichwill enhance the robustness and efficiency of the system.The first is the use of affine invariant methods for 3D objectrecognition. These methods utilise local descriptors forfeature matching and it has been shown to have highersuccess and accuracy rates when complex objects, cluster orocclusion is a problem [46]. An automatic thresholdselection process can also be implemented. This willeliminate the need to manually select a threshold fordetermining if an object is correctly recognised or not.The current threshold value was determined empirically andis not guaranteed to work in every possible case. Lastly, thesystem can be optimised to reduce the processing timerequired for recognising objects. No effort was made toimprove the efficiency of the LabVIEW program developedin this research. Also, both the vision computer and theKUKA robot controller operate under Microsoft Windows.If the systems were developed in real time operatingsystems as well as using programming languages such asC++, it would drastically improve the performance of theoverall system.

References

1. Krar A, Gill S (2003) Exploring Advanced ManufacturingTechnologies. Industrial Press Inc., New York

2. Abdullah M, Guan L, Lim K, Karim A (2004) The applications ofcomputer vision system and tomographic radar imaging forassessing physical properties of food. J Food Eng 61:125–135,DOI 10.1016/S0260-8774(03)00194-8

3. Billingsley J, Dunn M (2005) Unusual vision - machine visionapplications at the ncea. Sens Rev 25(3):202–208, DOI 10.1108/02602280510606480

4. Brosnan T, Sun DW (2004) Improving quality inspection of foodproducts by computer vision - a review. J Food Eng 61:3–16, DOI10.1016/S0260-8774(03)00183-3

5. Gunasekaran S (1996) Computer vision technology for foodquality assurance. Trends Food Sci Tech 7(8):245–256, DOI10.1016/0924-2244(96)10028-5

6. Burschka D, Li M, Taylor R, Hager GD (2004) Scale-invariantregistration of monocular endoscopic images to ct-scans for sinussurgery. Lect Notes Comput Sc 3217(1):413–421, DOI 10.1016/j.media.2005.05.005

7. Yaniv Z, Joskowicz L (2005) Precise robot-assisted guidepositioning for distal locking of intramedullary nails. IEEE TransMed Imaging 24(5):624–625, URL 10.1109/TMI.2005.844922

8. Galantucci L, Percoco G, Spina R (2004) An artificial intelligenceapproach to registration of free-form shapes. CIRP Ann ManufTechnol 53(1):139–142

9. Varady T, Martin RR, Cox J (1997) Reverse engineering ofgeometric models - an introduction. CAD Comput Aided Des 29(4):255–268, DOI 10.1016/S0010-4485(96)00054-1

10. Zhang H, Zhang G, Shi Y, Zhao X (2005) Application ofbinocular vision probe on measurement of high-reflective metallicsurface. Proc SPIE Int Soc Opt Eng 5633:333–338, DOI 10.1117/12.570778

11. Bowyer K, Chang K, Flynn P (2006) A survey of approaches andchallenges in 3d and multi-modal 3d + 2d face recognition. CompVis Image Und 101(1):1–15, DOI 10.1016/j.cviu.2005.05.005

12. Conde C, Rodriguez-Aragon LJ, Cabello E (2006) Automatic 3dface feature points extraction with spin images. Lect NotesComput Sc 4142:317–328

13. Hu W, Tan T, Wang L, Maybank S (2004) A survey on visualsurveillance of object motion and behaviors. IEEE Trans SystMan Cybern Pt C Appl Rev 34(3):334–352, DOI 10.1109/TSMCC.2004.829274

14. Zubal G, Tagare H, Zhang L, Duncan J (1991) 3-d registration ofintermodality medical images. Proc Annu Conf Eng Med Biol 13(1):293–294, DOI 10.1109/IEMBS.1991.683942

15. Lobo J, Dias J (2003) Vision and inertial sensor cooperation usinggravity as a vertical reference. IEEE Trans Pattern Anal MachIntell 25(12):1597–1608, DOI 10.1109/TPAMI.2003.1251152

16. Moravec HP (1980) Obstacle avoidance and navigation in the realworld by a seeing robot rover. PhD thesis, Carnegie-MellonUniversity

17. Benhabib B (2003) Manufacturing: design, production, automa-tion, and integration. CRC Press, Boca Raton, FL, USA

18. Hornberg A (ed) (2006) Handbook of Machine Vision. Wiley-VCH, Weinheim, Germany

19. (2006) Cognex machine vision systems and machine visionsensors! http://www.cognex.com/ Accessed September 15, 2006

20. (2006) National instruments - test and measurement. http://www.ni.com/ Accessed September 15, 2006

21. Wong A, Rong L, Liang X (1998) Robotic vision: 3d objectrecognition and pose determination. IEEE Int Conf Intell Rob Syst2:1202–1209, DOI 10.1109/IROS.1998.727463

22. Chan V, Bradley C, Vickers G (2001) A multi-sensor approach toautomating co-ordinate measuring machine-based reverse engi-neering. Comput Ind 44(2):105–115, DOI 10.1016/S0166-3615(00)00087-7

23. Liao J, Wu M, Baines R (1999) Coordinate measuring machinevision system. Comput Ind 38(3):239–248, DOI 10.1016/S0166-3615(98)00093-1

24. Brunelli R, Poggio T (1993) Face recognition: features versustemplates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1052, DOI 10.1109/34.254061

25. Zitová B, Flusser J (2003) Image registration methods: a survey.Image Vision Comput 21(11):977–1000, DOI 10.1016/S0262-8856(03)00137-9

26. Chen HY, Li YF (2004) Non-model-based view planning foractive vision. Proc Annu Conf Mechatronics Machine VisionPract MViP2004 pp 7–15

27. Gao J, Xu W, Geng J (2006) 3d shape reconstruction of teeth byshadow speckle correlation method. Opt Lasers Eng 44(5):455–465, DOI 10.1016/j.optlaseng.2005.04.013


http://dx.doi.org/10.1016/S0260-8774(03)00194-8

http://dx.doi.org/10.1108/02602280510606480

http://dx.doi.org/10.1108/02602280510606480

http://dx.doi.org/10.1016/S0260-8774(03)00183-3

http://dx.doi.org/10.1016/0924-2244(96)10028-5

http://dx.doi.org/10.1016/j.media.2005.05.005

http://dx.doi.org/10.1016/j.media.2005.05.005

http://dx.doi.org/10.1109/TMI.2005.844922

http://dx.doi.org/10.1016/S0010-4485(96)00054-1

http://dx.doi.org/10.1117/12.570778

http://dx.doi.org/10.1117/12.570778

http://dx.doi.org/10.1016/j.cviu.2005.05.005

http://dx.doi.org/10.1109/TSMCC.2004.829274

http://dx.doi.org/10.1109/TSMCC.2004.829274

http://dx.doi.org/10.1109/IEMBS.1991.683942

http://dx.doi.org/10.1109/TPAMI.2003.1251152

http://www.cognex.com/

http://www.ni.com/

http://www.ni.com/

http://dx.doi.org/10.1109/IROS.1998.727463

http://dx.doi.org/10.1016/S0166-3615(00)00087-7

http://dx.doi.org/10.1016/S0166-3615(00)00087-7

http://dx.doi.org/10.1016/S0166-3615(98)00093-1

http://dx.doi.org/10.1016/S0166-3615(98)00093-1

http://dx.doi.org/10.1109/34.254061

http://dx.doi.org/10.1016/S0262-8856(03)00137-9

http://dx.doi.org/10.1016/S0262-8856(03)00137-9

http://dx.doi.org/10.1016/j.optlaseng.2005.04.013

28. Mikolajczyk K, Schmid C (2005) A performance evaluation oflocal descriptors. Proc IEEE Comput Soc Conf Compu VisionPattern Recognit 2:257–263

29. Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J,Schaffalitzky F, Kadir T, Van Gool L (2005) A comparison ofaffine region detectors. Int J Comput Vision 65(1–2):43–72, DOI10.1007/s11263-005-3848-x

30. Brown LG (1992) A survey of image registration techniques.ACM Comput Surv 24(4):325–376, DOI 10.1145/146370.146374

31. Goshtasby A (2005) 2-D and 3-D image registration. Wiley,Hoboken

32. Mikolajczyk K, Schmid C (2004) Comparison of affine-invariantlocal detectors and descriptors. Proc 12th European SignalProcessing Conference pp 1729–1732

33. Büker U, Drüe S, Götze N, Hartmann G, Kalkreuter B, StemmerR, Trapp R (2001) Vision-based control of an autonomousdisassembly station. Robot Auton Syst 35(3–4):179–189, DOI10.1016/S0921-8890(01)00121-X

34. Jeong S, Chung J, Lee S, Suh IH, Choi B (2005) Design of asimultaneous mobile robot localization and spatial contextrecognition system. Lect Notes Comput Sc 3683 NAI:945–952,DOI 10.1007/11553939

35. Harris C, Stephens M (1988) A combined corner and edge detector.Proceedings of the 4th Alvey Vision Conference pp 147–151

36. Lowe DG (2004) Distinctive image features from scale-invariantkeypoints. Int J Comput Vision 60(2):91–110, DOI 10.1023/B:VISI.0000029664.99615.94

37. Peña Cabrera M, Lopez-Juarez I, Rios-Cabrera R, Corona-Castuera J (2005) Machine vision approach for robotic assembly.Assem Autom 25(3):204–216, DOI 10.1108/01445150510610926

38. Abdullah M, Bharmal M, Sardi M (2005) High speed robot visionsystem with flexible end effector for handling and sorting of meatpatties. 9th International Conference on Mechatronics Technology

39. Hough P (1962) Method and means for recognizing complexpatterns. Patent, patent number: 3069654

40. (2005) KUKA robot manual. KUKA, http://www.kuka.com/usa/en/newsevents/downloads/ Accessed September 14, 2006

41. Berge J (2005) Software for automation: architecture, integration,and security. ISA, Research Triangle Park, NC, USA

42. Iwanitz F, Lange J (2006) OPC: fundamentals, implementation &application, 3rd edn. Hüthig Fachverlag, Heidelherg

43. Held G (2003) The ABCs of TCP/IP. CRC Press44. (2005) PatMax and PatQuick. Cognex, One Vision Drive, Natick,

MA 01760-2059, USA45. Hartley RI, Zisserman A (2003) Multiple View Geometry in

Computer Vision, 2nd edn. Cambridge University Press, Cambridge46. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust

features. Lect Notes Comput Sc 3951 NCS:404–417


http://dx.doi.org/10.1007/s11263-005-3848-x

http://dx.doi.org/10.1145/146370.146374

http://dx.doi.org/10.1016/S0921-8890(01)00121-X

http://dx.doi.org/10.1007/11553939

http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94

http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94

http://dx.doi.org/10.1108/01445150510610926

http://www.kuka.com/usa/en/newsevents/downloads/

http://www.kuka.com/usa/en/newsevents/downloads/

Three-dimensional object recognition system for enhancing the intelligence of a KUKA robot

Documents

Transcript of Three-dimensional object recognition system for enhancing the intelligence of a KUKA robot