Procedural Generation of 3D Models Based on 2D Model Sheets

Model SheetsProcedural Generation of 3D Models Based on 2D

Academic year 2019-2020

Master of Science in Computer Science Engineering

Master's dissertation submitted in order to obtain the academic degree of

Counsellor: Ignace SaenenSupervisor: Prof. dr. Peter Lambert

Student number: 01000509Robin Arys

Admission to loan

The author gives permission to make this master dissertation available for consultation

and to copy parts of this master dissertation for personal use. In all cases of other use, the

copyright terms have to be respected, in particular with regard to the obligation to state

explicitly the source when quoting results from this master dissertation.

Robin Arys, June 2020

Preface

I would like to express my sincere gratitude to my counsellor, Ignace Saenen, for the usefulfeedback and suggestions, and for putting up with my many questions.Next to my counsellor, I would also like to thank my supervisor, Prof. dr. Peter Lambert,for allowing me to pursue the topic I proposed and for being understanding when it tooklonger than I initially envisioned.

I would like to thank both of my parents. Even though they’re no longer here physically,I know they would be proud right now.Besides my parents, I would also like to thank the rest of my family and all of my friendsfor the support they have provided me during some very difficult times.And in particular, I would like to thank my girlfriend, Jelena, for putting up with me forover a decade now.

Robin Arys, June 2020

Procedural Generation of 3D ModelsBased on 2D Model Sheets

Robin ArysSupervisor: Prof. dr. Peter Lambert

Counsellor: Ignace Saenen

Abstract

This thesis outlines how existing computer vision methods, multi-view stereo reconstruction

methods to be more precise, can be used to procedurally generate 3D models from 2D model

sheets. It outlines the challenges that arise when using computer vision methods on model

sheets instead of photographs, as well as how these challenges can be mitigated. For each

step of the proposed method, alternative methods are described and their performance

is compared on model sheets. The proposed method is evaluated on computer generated

model sheets and some pointers are given towards possible future research. The method

used to generate model sheet-like renders from a 3D model is also described.

Keywords

Model Sheets, Sketch-Based Modelling, Computer Vision, Multi-View Stereo Reconstruc-

tion, 3D Model Generation

Procedural Generation of 3D Models Based on 2DModel Sheets

Robin Arys

Supervisor(s): Prof. dr. Peter Lambert, Ignace Saenen

Abstract—This article tries to outline how existing multi-view stereo re-construction methods can be used to procedurally generate 3D models from2D model sheets. This method has been tested on different computer gen-erated model sheets.

Keywords— Model Sheets, Computer Vision, Multi-View Stereo Recon-struction, 3D Model Generation, Sketch-Based Modelling

I. INTRODUCTION

A typical procedure for the geometric modelling of three-dimensional models is to start with a simple primitive such

as a cube or sphere, and to gradually construct a more complexmodel through successive transformations or a combination ofmultiple primitives. To aid 3D modellers in this and to keepdesigns consistent across a production, character and prop de-signers create drawings of the model that has to be designedfrom different angles and in different poses, the so-called modelsheets.

The problem with this method is that it takes a lot of work toget from a simple cube to a high-polycount character model. Inthis abstract, a method is proposed that will procedurally gen-erate a rough draft for the final model that can serve as a betterstarting point for 3D modellers by leveraging the model sheetsthat are already provided by the designers.

The proposed method makes use of existing multi-view stereoreconstruction methods that have been fine-tuned to performbetter on model sheets. This method has been evaluated on dif-ferent computer generated model sheets.

II. GENERATING 3D MODELS FROM 2D MODEL SHEETS

The proposed method requires at least three different modelsheet views of the same model, all at different angles, but theseviews do not necessarily need to be equiangular. Pairs are cre-ated from these model sheet views, each pair containing twoadjacent angles. For each of those image-pairs, the underlyingepipolar geometry is recovered through the fundamental matrixand used to reconstruct a disparity map for each view in thepair. After this, all disparity maps for the same view that weregenerated in different image-pairs are combined to get a betterdisparity map for that view.

A. Feature Detection and Matching

The first step in multi-view stereo reconstruction methods isalways to find as much corresponding corners between the dif-ferent views as possible. These correspondences will then beused to recover the fundamental matrix describing the epipolargeometry that is underlying these views. Usually in multi-viewcomputer vision problems, there are an abundance of features to

detect and match. Since model sheets contain no texture infor-mation that can be used by the feature detector, but rather con-tain only edges and large, blank spaces, detecting good featureson a model sheet is a lot harder.

In the following step, the proposed method will use an eight-point algorithm to recover the fundamental matrix. This meansthat at least 8 correspondences are needed between both imagesin each image-pair. Eight corresponding points is the bare min-imum necessary to make the algorithm work but adding morecorrespondences is desired, since that will increase the qualityof the fundamental matrix estimation that can be calculated inthe next step.

The combination of a Shi-Tomasi feature detector [1] withan ORB feature descriptor (Oriented FAST, Rotated BRIEF)[2], matched with a simple brute-force algorithm using a cross-checking validity test provides the best results when match-ing features between model sheets. The Shi-Tomasi detectordoes not use a lot of texture information when detecting cornersand thus produces more stable results when used on texturelessmodel sheets than other popular feature detectors who do use alot of texture information. The ORB feature descriptor worksby computing binary strings for each feature by comparing theintensities of pairs of points around that feature. This method isnot only very fast but also performs well on model sheets. Sincethe amount of features that can be reliably found in most modelsheets is relatively low (in the magnitude of tens to a couplehundreds), a basic brute-force algorithm can be used to matchfeatures between both images in each image-pair. Using a cross-checking validity test in the brute-force matching algorithm in-creases the quality of the estimation for the fundamental matrix.

B. Recovering The Fundamental Matrix

To recover the best possible estimation of the underlying fun-damental matrix, RANSAC resampling is used in conjunctionwith a 7-point algorithm. RANSAC (Random Sample Consen-sus) is a resampling technique that generates candidate solutionsby using subsets of the minimum number of data points requiredto estimate the underlying model parameters [3].

Once the epipolar geometry has been recovered for eachimage-pair through a fundamental matrix estimation, this infor-mation can be used to determine projective transformations H1

- Hn and H ′1 - H ′n, with n the amount of image-pairs, that willrectify the model sheets. In each pair of rectified images, allcorresponding epipolar lines will become completely horizontaland at the same height. This transformation will make it a loteasier to calculate the disparity map in the next step.

C. Generating the 3D model

A disparity map shows the distance between pixels in bothimages of a pair that represent the same feature in the pictured3D scene.

Once each image-pair has been rectified, the rectified modelsheets can be used to create two disparity maps for each image-pair. One disparity map can be calculated for the disparity fromthe left to the right view and one can be calculated for the dispar-ity from the right to the left view. Since each model sheet is partof two different image-pairs (except for the left-most and right-most views), this means that for each model sheet two differentdisparity maps can be calculated.

Before calculating the disparity map itself, a Sobel filter isused on the rectified images. After disparity values have beenfound, different post-filtering methods are applied to them in arefining pass. These include a uniqueness check which defines amargin by which the minimum cost function should ”win” fromthe second-best value, quadratic interpolation to fill in somemissing areas, and speckle filtering to make the disparity mapmore smooth and filter out inconsistent pixels.

For calculating the disparity values, Hirschmuller’s SemiglobalMatching algorithm is used with some minor computation opti-misations. These optimisations consist of making the algorithmsingle-pass and thus only considering five directions instead ofeight, matching blocks instead of individual pixels, and using asimpler cost metric than the one described by Hirschmuller [4].

Since the resulting disparity maps are calculated from the rec-tified model sheets, they need to be derectified to get the dispar-ity maps for the original model sheets. This can be done easilyby applying the inverse of the transformations H1 - Hn and H ′1- H ′n that were used in the rectification step to each respectivedisparity map.

The disparity value for each pixel can be associated withdepth values through three-dimensional projection. This allowsus to directly translate each disparity map into a point cloud.A convex hull algorithm (Quick Hull) can then be run on thepoint cloud to generate a full 3D model of the scene. The recon-structed scene, however, is always determined only up to a sim-ilarity transformation (rotation, translation, and scaling). Thisis because it has been reconstructed from images of the scene,and images do not contain (enough) information about how thepictured scene behaves when compared to the rest of the world.

III. EVALUATION USING COMPUTER-GENERATED MODELSHEETS

In order to be able to evaluate the proposed method, a groundtruth disparity map is needed. The ground truth disparity mapsin this thesis were generated using a depth shader in Blender2.79. Since the model sheets use an orthographic projection, thedisparity values of the model sheet are the multiplicative inverseof their depth values and can thus be calculated in a shader pro-gram. The model sheets themselves were generated using theFreestyle function in Blender 2.79’s Cycles renderer. A blackFreestyle line with an absolute thickness of 1.5 pixels and acrease angle left at the default 134.43◦ was combined with acompletely white object material with its emission strength setto 1 so the 3D model’s faces would be uniformly white, mimick-

ing the typical black-and-white line-art style of real model sheetdrawings.

Fig. 1. Model sheet generated using the Freestyle function of the Cycles ren-derer in Blender 2.79.

A model sheet generated with this method for the Utah Teapotmodel can be seen in figure 1, while the ground truth disparitymap for this model can be found in figure 2. Figure 3 containsthe disparity map generated with the proposed method for thismodel sheet.

Fig. 2. Ground truth disparity map generated for the model sheet shown infigure 1.

Fig. 3. Ground truth disparity map generated for the model sheet shown infigure 1.

IV. CONCLUSION

The proposed method shows some promising results whenused on model sheets, but the final models are currently not yetusable in production environments. With a lot of manual tweak-ing, it is possible to get some good approximations though, aswas shown for the Utah Teapot dataset.

The method used to automatically generate model sheets pro-vides some nice looking results. This method could even be usedas a stylistic choice in a 3D production such as a video game oranimated movie or in further research regarding this topic.

REFERENCES

[1] Shi, Jianbo and Tomasi, Carlo, Good features to track, IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, pp. 593-600, 1994.

[2] Rublee, Ethan and Rabaud, Vincent and Konolige, Kurt and Bradski, Gary,ORB: An efficient alternative to SIFT or SURF, 2011 International confer-ence on computer vision, pp. 2564-2571, 2011.

[3] Derpanis, Konstantinos G., Overview of the RANSAC Algorithm, ImageRochester NY, vol. 4-1, pp. 2-3, 2010.

[4] Hirschmuller, Heiko, Stereo Processing by Semiglobal Matching and Mu-tual Information, IEEE Transactions on Pattern Analysis and Machine In-telligence, vol. 30-2, pp. 328-341, 2007.

Contents

Admission to loan iv

Preface v

Abstract vi

Extended abstract vii

Contents ix

1 Introduction 1

2 Related Work 52.1 Sketch-Based Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Evocative Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Constructive Modelling . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.1 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Corner Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.3 Stereo Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.4 The Fundamental Matrix . . . . . . . . . . . . . . . . . . . . . . . . 182.2.5 Projective Rectification . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.6 Disparity Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Methodology 253.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2 Model Sheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Corner Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . . 313.4 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5 Image Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.6 Disparity Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.7 Point Cloud and Convex Hull . . . . . . . . . . . . . . . . . . . . . . . . . 36

ix

CONTENTS CONTENTS

3.8 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Method Evaluation 394.1 Corner Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Quality metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.1.2 Template based detectors . . . . . . . . . . . . . . . . . . . . . . . 434.1.3 Gradient based detectors . . . . . . . . . . . . . . . . . . . . . . . . 454.1.4 SIFT and SURF detectors . . . . . . . . . . . . . . . . . . . . . . . 464.1.5 Feature descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.1.6 Corner matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.3 Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.4 Disparity Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4.1 Evaluating the Global Method . . . . . . . . . . . . . . . . . . . . . 57

5 Conclusion 64

6 Future Work 656.1 Possible Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Possible Extensions and Adaptations . . . . . . . . . . . . . . . . . . . . . 66

x

List of Figures

1.1 An example of a model sheet used for 2D animation in Walt Disney’s ani-mated shortWinnie the Pooh and the Honey Tree. © 1966 The Walt DisneyCompany. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 An example of a model sheet used for 3D modelling in the animated movieSuperman/Batman: Apocalypse and a corresponding 3D model, reproducedfrom [3]. Model sheet: © 2010 Warner Bros. Animation ‖ 3D model: ©2017 David Dias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 If the user inputs the strokes on the left side, the SKETCH system willproduce the primitives on the right (reproduced from [14]). . . . . . . . . . 7

2.2 A template retrieval system matching sketches to predefined 3D models.(reproduced from [15]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 An example of a mechanical object, constructed in SolidWorks (reproducedfrom [18]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 A skeleton and its corresponding polygonised implicit surface in BlobMaker,a free-form modelling system (reproduced from [20]). . . . . . . . . . . . . 11

2.5 Iterative sketch-based modelling. Surface details and additive augmenta-tions are added to an existing model (reproduced from [12]). . . . . . . . . 11

2.6 The epipolar geometry between two views of the same scene. The point xis projected at xL and xR in the two views. (reproduced from [22]). . . . . 12

2.7 One of the disparity maps from the Tsukuba dataset [68]. . . . . . . . . . . 23

3.1 A comparison between more classical computer vision datasets and modelsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 The different components of the proposed method as well as the input andoutput at each step, and the feedback loops that can be used to optimise animplementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Blender model sheet-like renders for a very basic 3D character. . . . . . . . 29

xi

LIST OF FIGURES LIST OF FIGURES

3.4 A recreation of the model sheets from figure 1.2, albeit using a slightlydifferent mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.5 One of the ground-truth depth maps rendered with Blender 2.79. . . . . . 303.6 Blender’s node-based compositing tool as it was set up to generate the

ground-truth depth maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.7 Performance of different feature detectors for different viewpoint changes.

Graph reproduced from [41]. . . . . . . . . . . . . . . . . . . . . . . . . . . 323.8 A rectified image-pair with its rectified matches. . . . . . . . . . . . . . . . 343.9 A disparity map retrieved with the proposed method. . . . . . . . . . . . . 363.10 A disparity map generated from the model sheets shown in figure 1.2. . . . 37

4.1 Rectified matches for one of the generated model sheets for different featuredetectors and the corresponding rectification quality metrics. The rotationangle between both model sheet views is 5◦. . . . . . . . . . . . . . . . . . 42

4.2 The model sheet views used in this test. From left to right, the camerarotation is 40◦, 45◦, and 50◦. . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3 Different quality metrics in function of the parameters of the 7-point RANSACalgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4 A typical disparity map found with local-approach block matchers on modelsheets. The spaces between lines are empty. . . . . . . . . . . . . . . . . . 52

4.6 The disparity quality metrics in function of the algorithm’s block size. . . . 564.8 The left-view and right-view disparity maps, and their combination. . . . . 564.5 Some of the methods that were tried unsuccessfully to improve the quality

of local-approach matchers on model sheets. . . . . . . . . . . . . . . . . . 594.7 Disparity maps generated with large and less-large block sizes, as well as

their ground truth disparity map. . . . . . . . . . . . . . . . . . . . . . . . 60

xii

List of Tables

4.1 A comparison of the performance of different feature detectors on modelsheets using the ORB descriptor and a brute-force matching algorithm. . . 41

4.2 A comparison of the performance of different corner detectors on modelsheets using different descriptors. MSD = Mean Sampson distance, SEV =Smallest eigenvalue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 The different feature detectors and descriptors that were evaluated, the nameof their OpenCV implementations, and the parameters that were used duringthe evaluations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 A comparison of the difference in quality metrics when enabling or disablingcross-checking in the brute-force matcher for the Shi-Tomasi/ORB combi-nation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.5 A comparison of OpenCV’s default Semiglobal Matching parameters andthe parameters proposed for use on model sheets. . . . . . . . . . . . . . . 55

4.6 The datasets used to evaluate the global method. . . . . . . . . . . . . . . 614.7 The reconstructed disparity maps for all evaluated datasets. . . . . . . . . 624.8 The parameters used to generate the results in table 4.7. . . . . . . . . . . 63

xiii

Listings

4.1 Calculating the correlation between disparity maps . . . . . . . . . . . . . 534.2 Implementation of the Middlebury bad 2.0 metric . . . . . . . . . . . . . . 54

xiv

Chapter 1

Introduction

Model sheets are precisely drawn groups of pictures that outline the size and constructionof a prop or character’s design from a number of viewing perspectives. They are createdin order to both accurately maintain detail and to keep designs uniform across a produc-tion whilst different people are working on them [1]. Model sheets are used both in 2Danimations and in 3D modelling.

In 2D animation, model sheets enable a number of animators working across a produc-tion to achieve consistency in representation across multiple shots. Model sheets used for2D animation often contain different facial expressions, different poses, and annotations toa production’s animators about how to develop and use particular features of a character,as can be seen in figure 1.1 [2].

During the production of a 3D video game or animated movie, model sheets are sentto the modelling department who use them to create the final character models [1]. Figure1.2 shows an example of a model sheet used during the production of a 3D animated movieand a professional reproduction of that character. Model sheets can contain full colour andshading but usually just consist of line drawings. The rest of this thesis will mainly focuson model sheets used during 3D modelling, only showing a character’s rotation withoutdifferent poses or facial expressions.

A typical procedure for the geometric modelling of three-dimensional models is to startwith a simple primitive such as a cube or sphere, and to gradually construct a more complexmodel through successive transformations or a combination of multiple primitives [4].

This method of 3D modelling is called primitive-based modelling and is a type of polyg-onal modelling. There exist other methods for creating (non-polygonal) 3D models suchas NURBS (non-uniform rational Bezier splines), subdivision surfaces, constructive solid

1

INTRODUCTION

(a) The model sheet showing poses and facial expressions.

(b) A still showing one of the finalised animations described on the modelsheet.

Figure 1.1: An example of a model sheet used for 2D animation in Walt Disney’s animated shortWinnie the Pooh and the Honey Tree. © 1966 The Walt Disney Company.

2

INTRODUCTION

Figure 1.2: An example of a model sheet used for 3D modelling in the animated movie Super-man/Batman: Apocalypse and a corresponding 3D model, reproduced from [3].Model sheet: © 2010 Warner Bros. Animation ‖ 3D model: © 2017 David Dias

3

INTRODUCTION

geometry, and implicit surfaces, but objects using these representations often need to beconverted into polygonal forms for visualisation and computation anyway since today’sgraphics hardware is highly specialised in displaying polygons at interactive rates [5]. Be-cause today’s graphics hardware is so specialised in displaying polygons, methods likeNURBS are rarely used for video games and other interactive media [6].

The drawback to primitive-based modelling, however, is that models built in a primitive-based way tend to create more faces than necessary to define their structure because therecan be no overlap in a pure primitive-modelled object. Instead, each part of an object isincised into its immediate neighbours. Another drawback is that for shapes other than theprimitives themselves it can be difficult to position each point exactly correctly [7]. Also,a lot of work has to be done to get from a simple primitive to a high-polygon-count full3D model.

This thesis poses that the process of primitive-based geometric modelling of three-dimensional models that is used today could be improved by providing a better startingpoint for 3D modellers than a simple primitive by leveraging the model sheets they alreadyreceive from the animators and designers. This thesis proposes a new method, usingmultiple existing computer vision methods, that can use existing model sheets to generatea rough draft for a three-dimensional model that can then be further refined by 3D artists.In order to accomplish this, the currently existing related work will first be examined inchapter 2. Afterwards, the methodology used to accomplish this goal will be described inchapter 3. In chapter 4, the proposed method will be evaluated on different model sheetsbefore coming to a conclusion in chapter 5 and providing some pointers for possible futureresearch in chapter 6.

4

Chapter 2

Related Work

Biederman in [8] researched the psychological theory of human image understanding. Heconcluded that all components that are postulated to be the critical units for recognitionare edge-based and can thus be depicted by a line drawing. Colour, brightness, and textureare secondary routes for recognition. Biederman’s findings are important for animators andother visual artists, but this more psychological side is not the focus of this work. In thiswork, we will instead focus more on the research done by Barrow and Tenenbaum in [9] and[10]. In these publications, Barrow and Tenenbaum, did research around the developmentof a computer model for interpreting two-dimensional line drawings as three-dimensionalsurfaces and surface boundaries. Their model interprets each line separately as a three-dimensional space curve that then gets assigned a local surface normal [10].

Another, maybe more surprising but also highly relevant research domain is that ofarchitecture. Using 3D building models is extremely helpful throughout the architectureengineering and construction life cycle. Such models let designers and architects virtuallywalk through a project to get a more intuitive perspective on their work. Digital models ofbuildings can check a design’s validity by running computer simulations of energy, lighting,fire, and other characteristics. These models are also used in real estate, virtual city tours,and video gaming. Because of their wide applicability, researchers and CAD developershave been trying to automate and accelerate conversion of 2D architectural drawings into3D models. Since paper floor plans still dominate the architectural workflow, these au-tomatic plan conversion systems must also be able to process raster images. In order todo this, they use a combination of image-processing and pattern-recognition techniques.These systems lack generality though. They are typically constrained to a small set ofpredefined symbols [11].

5

RELATED WORK 2.1 Sketch-Based Modelling

The works of Barrow and Tenenbaum, and the work done in the architectural field arepart of two larger research fields. These fields are both also closely related to the problemof reconstructing 3D models from model sheets.

The first of these fields explores sketch-based modelling, a technique that automates orassists the process of translating a 2D sketch into a 3D model [12].

The second field is that of computer vision, which explores techniques to constructexplicit, meaningful descriptions of physical objects from images by using intrinsic infor-mation that may reliably be extracted from the input, similar to how the human visualsystem works [13].

This thesis positions itself on the cross-section of both research topics. The proposedmethod will draw inspiration from existing computer vision techniques to reconstruct 3Dmodels by extracting information from the 2D sketches of those models. The next sectionsin this chapter will further explore work that was previously done in these two researchfields and how that research could apply to the topic of this thesis.

2.1 Sketch-Based Modelling

Sketch-based modelling techniques enable people to create complex 3D models in a waythat is as intuitive as possible for humans [12]. As such, most techniques try to make usersinput as few information as possible, usually in the form of simple, 2D strokes. Thesestrokes can then be converted to 3D models in one of two ways: either through evocativemodelling, where multiple, prebuilt 3D objects are combined in a 3D scene to approximatethe desired 3D model, or through constructive modelling, which uses a rule-based approachto build the entire 3D model as one whole object [12]. Both have their own advantagesand disadvantages.

2.1.1 Evocative Modelling

Evocative modelling systems are systems that contain a collection of pre-existing 3D shapesand that approximate a complex model by placing these shapes according to the user’sdirectives. Within evocative modelling systems there’s again a distinction between twosubsets:

• iconic systems map a set of 3D primitives to simple strokes. In the SKETCH applica-tion [14], for example, users can create and place 3D models in a 3D scene by forming

6


sequences of five different types of strokes that are then interpreted and replaced by13 different primitives. A subset of these strokes and their resulting primitives can beseen in figure 2.1. This small number of primitive objects often requires the user tocombine multiple primitives together to create the object they want to create. Onlyusing primitives even precludes some complex objects – freeform surfaces, for exam-ple – from being made at all [14]. Additionally, when combining different primitivesto form a more complex model, the end result will contain vertices that lie on the"inside" of the object, and thus aren’t visible. However, these invisible vertices arestill stored and rendered by the system since the modelling system only knows howto work with the set of pre-existing primitives. This results in a worse mesh topologyin the end and a worse performance.

Figure 2.1: If the user inputs the strokes on the left side, the SKETCH system will produce theprimitives on the right (reproduced from [14]).

• The other main subset of evocative modelling systems are template systems. Tem-plate systems contain a set of, possibly very detailed, 3D template objects that canbe placed in the 3D scene in any way the users wants. For example, by dragging anddropping the objects around to the desired rotation, orientation, and scale. An ex-ample of the "Magic Canvas" template system can be seen in figure 2.2. This systemtries to automatically match multiple models from its database to 2D sketches andlets the user pick one of these models to place in the scene [15].

While both of these evocative modelling systems are good at allowing the user to create

7


Figure 2.2: A template retrieval system matching sketches to predefined 3D models. (reproducedfrom [15]).

a 3D model without the requirement to have any modelling knowledge, all evocative systemsshare at least one big issue: they’re based on prior knowledge in the form of a databaseof prebuilt 3D shapes. Users of evocative systems are limited to the prebuilt 3D shapesincluded with the system. Using a pre-existing database takes up a lot of memory andtakes time to search for the wanted shapes during usage. Both of these problems becomeworse as the size of the included database increases. In some evocative modelling systems,such as Kenney’s Asset Forge [16], users are allowed to add their own shapes. This howeverrequires them to have modelling knowledge, thus defeating the purpose of allowing usersto create a 3D model without the requirement to have any modelling knowledge.

Template systems can, for example, contain a database of domain-specific models, thusallowing the user to create very detailed models or 3D scenes within a certain domain butnot in other domains that might require a database of different models. For example, inarchitecture there are a number of existing systems that contain a database of differenttypes of walls, doors, windows, and furniture. Since architects already have a set way ofrepresenting each of these in their 2D drawings, template-based modelling systems cantranslate those icons to one of their pre-built templates to create a 3D model from anarchitectural drawing [11]. This same template system, however, won’t be very usefulwhen trying to design a 3D model for an industrial machine. Systems to model industrialmachines don’t need doors and windows, they needs gears and screws, and other mechanicalcomponents.

Iconic systems on the other hand do allow the user to create models from differentdomains with the same library, but the user generally won’t be able to add the levelof detail they want to since they’re limited to simple primitives. Using iconic systems,users only have access to primitives like cubes, spheres, and cylinders, instead of morecomplicated models like windows and furniture, or gears and screws.

8


The easiest way to improve evocative modelling systems is by including a bigger objectlibrary, but this in turn also leads to larger performance and resource requirements toprocess, match, and store these objects. The requirements of a bigger evocative modellingsystem become larger regardless of whether or not the user actually needs those extraobjects for the 3D model or scene they’re trying to create [12].

2.1.2 Constructive Modelling

Whereas evocative modelling systems need to have prior knowledge of some 3D shapes tobuild a new 3D model, constructive systems use a set of rules to procedurally generate acompletely new 3D shape based on some 2D drawing [17]. Constructive modelling systemsare thus not restricted by a set database of pre-existing objects, like evocative modellingsystems are, but rather by the robustness of the ruleset that is used, and by the ability of thesystem’s interface to expose the full potential [12]. Because constructive modelling systemsare not restricted by a database of pre-existing objects but rather by the robustness of theruleset that they use to generate 3D shapes, they generally allow for more applications indifferent fields but are also much harder to use correctly. In order to use a constructivemodelling system to its full potential, the user might want to edit, delete, or add new rulesto the existing ruleset. Editing the ruleset of a constructive modelling system requires theuser to have pre-existing knowledge about how these systems work, and possibly requiresthem to understand the language that was used to design the original ruleset. Such a strongrequirement is not feasible for everyone working in a complex domain context without firstreceiving proper training and insight into the system itself [12].

In order to reduce the total amount of possible interpretations of a given 2D drawing,constructive modelling systems sometimes constrain the complexity of their ruleset. Bydoing so they are again exploiting prior knowledge about the domain in which these systemswill be used, similar to template modelling systems [12]. We can define three main groupsof possible drawing interpretations within constructive modelling systems:

• Mechanical objects consist mainly of hard-edged, planar objects. These are the kindsof modelling systems that are often used in engineering fields, where the surfaces ofobjects are usually flat, and corners and edges are well-defined. These systems havelimited support for curved strokes, since their reconstruction is primarily based on astraight-line representation [12]. The design and specification of engineered objects isan important application of computer modelling. As such, it was one of the first re-

9


Figure 2.3: An example of a mechanical object, constructed in SolidWorks (reproduced from[18]).

search fields within the field of sketch-based modelling [12], [19]. However, nowadaysthe usage of solid modelling CAD (Computer Aided Design) systems has becomemore prevalent, although some designers still prefer to use constructive modellingtechniques since they can use their intuition more effectively when sketching thanthey can when using a solid modeller [19].

• Free-form objects are almost the exact opposite of mechanical objects in that theyconsist of smooth, more natural looking objects. There are different ways to createfree-form design systems, but one of the more prevalent methods is to use a skeleton-based approach that tries to reconstruct a model’s skeleton based on contour sketches.Popular systems that use this skeleton-based technique, like BlobMaker [20], usevariational implicit surfaces. Because of their variational implicit surfaces, free-formobjects created with skeleton-based free-form design systems are limited to closedsurfaces and completely smooth shapes [20]. An example of the skeleton-based free-form modelling technique in BlobMaker can be found in figure 2.4. Other free-formmodelling systems, such as Teddy [4], use a triangulated mesh that can be easilymodified but are limited to constructing a single object and don’t support hierarchywithin that object [4], [20].

• Multi-view systems typically interpret the strokes in a 2D drawing as object bound-aries. Different drawings can then be interpreted as different views of the sameobjects, or a single image can be combined with a set epipolar geometry to allow theuser to "sketch in 3D" by drawing in a different view than the original sketch’s view.However, sketching in 3D without interactive feedback is difficult for most humans,since our visual system is built around 2D stimuli. Thus, most systems only imple-ment simple reconstruction within an iterative modelling paradigm. That is, rather

10


Figure 2.4: A skeleton and its corresponding polygonised implicit surface in BlobMaker, a free-form modelling system (reproduced from [20]).

than the user creating actual 3D sketches or multiple sketches of the same objectobject, they can reconstruct a single sketch, rotate the model, sketch a new part ora deformation, ad infinitum until the desired result is achieved [12]. An example ofwhat one iteration can look like in an iterative sketch-based modelling technique isshow in figure 2.5.

Figure 2.5: Iterative sketch-based modelling. Surface details and additive augmentations areadded to an existing model (reproduced from [12]).

Since model sheets already contain drawings of different views of the same three-dimensional object, the remainder of this chapter will focus on this last set of constructive

11

RELATED WORK 2.2 Computer Vision

modelling systems: the multi-view systems.

2.2 Computer Vision

The field of computer vision focuses on ways to automatically extract information fromimages [13]. One of the applications of computer vision — the one that is most relevant tothis dissertation — is the field of multi-view stereo reconstruction, where algorithms areused to try to reconstruct a complete 3D object model from a collection of images takenfrom known camera points [21].

2.2.1 Epipolar Geometry

Figure 2.6: The epipolar geometry between two views of the same scene. The point x is projectedat xL and xR in the two views. (reproduced from [22]).

Consider the situation shown in figure 2.6 of a common scene viewed by two pinholecameras. The locus of all points that map to the point xL in the left image consists ofa straight line through the centre of the camera used in the left view. As seen from thesecond camera, this straight line of points that map to xL maps to a straight line in theright image known as an epipolar line. Any point in the right view matching the left view’spoint xL must lie on this epipolar line. All epipolar lines in the right view correspondingto points in the left view meet in one point eR, called the epipole. The epipole is thepoint where the centre of projection of the left camera would be visible in the right image.Similarly, there is an epipole eL in the left view defined by reversing the roles of the two

12


views in the above discussion [23]. Epipoles don’t necessarily lie in another view, it ispossible that one camera is not able to see the centre of projection of every other camera.

There exists a projective mapping from points in the left view to epipolar lines in theright view and vice-versa. Furthermore, there exists a 3x3 matrix F called the fundamentalmatrix which maps points in one view to the corresponding epipolar line in the other viewaccording to the mapping x 7→ Fx, and thus defines the epipolar geometry between bothimages [23]. It can be proven that the fundamental matrix between two images can becomputed from correspondences of those images’ scene points alone, without requiringknowledge of the cameras’ internal parameters or relative pose [24]. However, exact pointcorrespondences often cannot be retrieved from an image, since a three-dimensional scenepoint can be projected onto multiple pixels or fall in between pixels in one of the views.Because the exact correspondences cannot always be retrieved, the exact fundamentalmatrix cannot always be recovered and an estimation must be used.

The accuracy of an estimation for the fundamental matrix (and subsequently the epipo-lar geometry) is crucial for applications such as 3D reconstruction. The fundamental matrixestimation accuracy is a function of the number of correct point correspondences attainedbetween the images that are being evaluated [25]. Therefore, 3D reconstruction systemsmust strive to establish as much correct correspondences between images as possible. Tothis end, a corner detector is first applied to each image to extract high curvature points.A classical correlation technique is then used to establish matching candidates between thetwo images [26]. There exist multiple techniques to detect corners in images, we will nowbriefly explore the ones most relevant to multi-view reconstruction.

2.2.2 Corner Detection

Corner detection is a low-level processing step which serves as the essential part for com-puter vision based applications [27]. When recovering the epipolar geometry between twoviews of the same scene, a corner detector must first be applied to each view separatelyin order to be able to establish correspondences between images [26]. Thus, using a goodcorner detector is pivotal to recovering a good estimation for the fundamental matrix andthe epipolar geometry it describes.

There exist three different classes of corner detectors: gradient based, template based,and contour based methods [27]. Each of these classes will be briefly described next.

13


Gradient Based Corner Detection

Gradient based corner detection is based on gradient calculations. Most of the earliercorner detectors are gradient based detectors. Gradient based corner detectors suffer fromnoise-sensitivity and a high computational complexity [27]. One of the most-used gradientbased corner detectors is the Harris detector [28]. Other gradient based corner detectorsproposed in early papers are the LTK (Lucas-Tomasi-Kanade) [29] and Shi-Tomasi [30]corner detectors. The main difference between these detectors lies in the implementationof their cornerness measurement function, the function which calculates how likely it isthat a given pixel is part of a corner [27].

Template Based Corner Detection

Template based corner detectors find corners by comparing the intensity of surroundingpixels with that of the centre pixel that is being evaluated. Templates are first defined andplaced around the centre pixels. The cornerness measurement function is devised from therelations of surrounding/centring pixel intensities. The computational cost for templatebased corner detection is relatively lower than that of gradient based methods, which makesthese methods faster. In recent years, templates are combined with machine learningtechniques (decision trees in particular) for fast corner detection. Examples of templatebased corner detectors include SUSAN (Smallest Univalue Segment Assimilating Nucleus),which is a more traditional template based corner detector [31], and FAST (Features fromAccelerated Segment Test), which uses machine learning in the form of a decision tree [32],and its derivations FAST-ER [33] and AGAST (Adaptive and Generic Accelerated SegmentTest) [34]. Although the application of machine learning helps reduce the computationalcost of corner detection, it can also cause database-dependent problems when the trainingdata does not cover all possible corners. Template based detectors can detect a largernumber of corners than gradient based methods. Since more points lead to more reliablematching, the larger number of detected corners is preferred in wide baseline matching [27].However, the number of detected corners is not stable under different imaging conditionsfor template based corner detectors [35]. Another disadvantage of template based methodsis the lack of effective and precise cornerness measurements [27].

14


Contour Based Corner Detection

Contour based corner detection is based on the result of contour and boundary detec-tion. These methods aim to find the points with maximum curvature in the planar curveswhich compose of edges. Traditional contour based corners are specified in binary edgemaps. Curve smoothing and curvature estimation are two essential steps for these detec-tors, while Gaussian filters are the most widely used smoothing function. Examples ofcontour based corner detectors include DoG-curve (Difference of Gaussians curve) [36],DoH (Determinant of the Hessian) [37], ANDD (Anisotropic Directional Derivative) [38],Hyperbola fitting [39], and ACJ (A Contrario Junction) detection [40]. Contour basedcorner detection is much different from gradient and template based methods in both de-tection framework and application area. Their detected corners are more applied in shapeanalysis and shape based image compression, rather than wide baseline matching [27].

Feature Descriptors

After corners have been detected, a feature descriptor is used. A descriptor is a vector thatcharacterises the local image appearance around the location identified by a corner detector.After a descriptor is calculated, that corner can then be matched with one or more cornersin other images [41]. The most widely used descriptor, SIFT (Scale-Invariant FeatureTransform) is computed from gradient information [42]. Local appearance is describedby histograms of gradients, which provides a degree of robustness to translation errors[41], [43]. The SIFT algorithm also defines a corner detector, but the SIFT detector usesa Difference of Gaussians method, which is a contour based corner detection method andthus less suitable for wide baseline matching [27], [42]. SURF (Speeded Up Robust Features)is another popular algorithm containing both a feature detector and a feature descriptor.The SURF algorithm was inspired by SIFT but is faster while still providing a similarperformance [43]. The SURF detector, also called the Fast-Hessian detector, is a regiondetector instead of a corner detector however [43], and thus not suitable for retrieving theepipolar geometry between images [26]. Both the SIFT and SURF algorithms are patentedin the US, although the SIFT patent has recently expired on April 18th, 2020 [44], [45].

More recently, there has been an increasing amount of research into binary descriptorssuch as BRIEF (Binary Robust Independent Elementary Features) [46]. Binary descriptorsrun in a fraction of the time required by SURF and similar algorithms, while still having asimilar or better performance and a reduced memory footprint [27], [46]. Binary descriptors

15


work by computing binary strings for every detected feature by comparing the intensitiesof pairs of points around that feature. Furthermore, the descriptor similarity for BRIEFdescriptions can be evaluated using the Hamming distance instead of the L2 norm, whichis used in SIFT and SURF. Calculating a Hamming distance is very efficient [46]. ORB(Oriented FAST and Rotated BRIEF ) [47] and BRISK (Binary Robust Invariant ScalableKeypoints) [48] take the process of feature description one step further by integrating cornerdetection based on binary decision trees with binary feature descriptors. The attachedbinary description helps reduce the storage burden and time cost for matching, thereforeboth saving time and storage [27]. ORB, for example, is two orders of magnitude fasterthan SIFT while performing just as well in most situations [47].

2.2.3 Stereo Matching

Once a set of corners, and their matching descriptors, have been found for each image, theyneed to be matched with the corners in other images. These matches can then later beused to calculate an estimation of the underlying fundamental matrix for that image-pair.Searching for nearest neighbour matches is the most computationally expensive part ofmany computer vision algorithms [49].

The solution to a matching problem can be found by using a brute-force matchingalgorithm. A brute-force matching algorithm works by comparing every feature from imageI1 to every feature from image I2. Feature F1 in I1 is said to match feature F2 in I2 if thedistance between their descriptors is minimal, i.e. if the descriptor for F2 is the closest tothe descriptor for F1 according to some distance metric.

The distance metric used in matching algorithms is dependent on the feature descriptorsthat were used. The two main metrics used for the distance between features are theHamming distance and the L2 distance. The Hamming distance uses only bit manipulationsand is thus very fast but only usable with binary descriptors [50]. The distance betweenvector-based descriptors, like SIFT and SURF, can be calculated using the L2 distancewhich is slower and doesn’t work for binary descriptors.

In order to increase the quality of matches, each match may need to pass some testbefore it can be validated. An often-used, strict match validity test is called cross-checking.When using cross-checking, it is not sufficient if the descriptor for F2 is the closest to thedescriptor for F1, the reverse must also be true before a match can be validated. Thisvalidity test allows to greatly reduce the probability of error [26].

16


Another often-used match validity test is Lowe’s ratio test. Lowe’s ratio test comparesthe ratio of the distance between F1 and F2 to the distance between F1 and F ′2, where F2 isthe best match for F1 while F ′2 is the second best match for F1. A match is only validatedwhen this ratio falls below a pre-defined threshold, i.e. the best match is significantlycloser than the second-best match. The reasoning behind Lowe’s ratio test is that for falsematches, there will likely be a number of other false matches within similar distances givena high dimensional feature space [42].

Since some features may have no matches, e.g. because they are occluded in one of theimages, a threshold can be used for the "furthest distance" that is still considered a matchto further reduce the number of false positives [51].

As brute-force algorithms need to compare every feature from image I1 to every featurefrom image I2, they are quadratic in the number of extracted features, which makes itimpractical for most applications [51]. This high cost has generated an interest in heuris-tic algorithms that perform approximate nearest neighbour search. Heuristic algorithmssometimes return non-optimal neighbours but can be orders of magnitude faster than exactsearches, while still providing near-optimal accuracy [49]. There exist hundreds of differ-ent approximate nearest neighbour search algorithms [51]. Initially, the most widely usedalgorithm for nearest-neighbour search was the kd-tree [52], which works well for exactnearest neighbour search in low-dimensional data, but quickly loses its effectiveness as di-mensionality increases [49]. kd-Trees work by dividing the multi-dimensional feature spacealong alternating axis-aligned hyperplanes, choosing the threshold along each axis so as tomaximise some criterion [53]. Later, this algorithm was adapted by [54] to use multiplerandomised kd-trees as a means to speed up approximate nearest-neighbour search, per-forming well over a wide range of problems [49]. The algorithm proposed by [54] usingrandomised kd-trees was further adapted by [49] into the FLANN matcher (Fast Libraryfor Approximate Nearest Neighbours). The FLANN matching algorithm works by main-taining a single priority queue across all randomised kd-trees so search can be ordered byincreasing distance to each bin boundary. The degree of approximation is determined byexamining a fixed number of leaf nodes, at which point the search is terminated and thebest candidates are returned [49].

17


2.2.4 The Fundamental Matrix

The fundamental matrix F is a homogeneous 3x3 matrix which defines the epipolar geom-etry between two different views of the same three-dimensional scene. The fundamentalmatrix was first described in [55] as a generalisation of the essential matrix first described adecade earlier in [56]. The essential matrix of an image-pair is dependent on the calibrationof the cameras used to capture both images, while the fundamental matrix is completelyindependent of the cameras or scene structure [24]. The fundamental matrix is the only ge-ometric information available from two uncalibrated images [26], it can be computed fromcorrespondences of image scene points alone, without requiring knowledge of the cameras’internal parameters or relative pose [24]. Recovering the fundamental matrix is one of themost crucial steps of many computer vision algorithms [26].

It can be proven that the fundamental matrix F is only defined up to a scale factor[26]. Because scaling is not significant, the 3x3 matrix F has eight remaining independentratios. However, it can also be proven that F satisfies the constraint

detF = 0, (2.1)

which means that the fundamental matrix F only has seven degrees of freedom [24]. Thatequation (2.1) holds true can also be seen intuitively in that F can map multiple points tothe same epipolar line, which means there can exist no inverse mapping and thus F mustnot be invertible, which means that its determinant must be zero [24].

Given a pair of matching points x and x′ , the fact that x′ lies on the epipolar line Fx

means that

(x′)TFx = 0. (2.2)

Since the matrix F only has seven degrees of freedom, it is possible to determine F bysolving a set of linear equations of the form (2.2) given at least 7 point matches [23].

As discussed earlier in section 2.2.1, the accuracy of the fundamental matrix estimateis a function of the number of correct correspondences attained between images [25]. Mostalgorithms rely on a minimum of eight such correspondences, such as Harley’s originalalgorithm [55], [57]. There exist techniques that can estimate the epipolar geometry withas few as five point correspondences by exploiting additional knowledge about the cameralocations and the projected scene [24], [58].

18


Algorithms for estimating the fundamental matrix between two images have proven tobe very sensitive to noise [26]. Noise is introduced by outliers in the initial correspondences.Outliers can be caused by two things:

• Bad locations. In the estimation of the fundamental matrix, the location error ofa point of interest is assumed to exhibit Gaussian behaviour. This assumption isreasonable since the error in localisation for most points of interest is small (withinone or two pixels), but a few points are possibly incorrectly localised (more thanthree pixels). These points will severely degrade the accuracy of the fundamentalmatrix estimation [26].

• False matches. In the establishment of correspondences, only heuristics have beenused. Because the only geometric constraint, i.e. the epipolar constraint in terms ofthe fundamental matrix, is not yet available, many matches are possibly false. Thesewill completely spoil the estimation process, and the final estimate of the fundamentalmatrix will be useless [26].

These outliers will severely affect the precision of the fundamental matrix estimation ifwe use them directly to calculate the estimation [26]. Additionally, If more correspondencesare found than needed for a given algorithm, then they may not all be fully compatible withone projective transformation, and the best fitting transformation will have to be selectedgiven the data [24]. Both detecting outliers and selecting the best fitting fundamentalmatrix estimation for all inliers can be done by finding the transformation that minimisessome cost function. Two popular robust estimation methods that accomplish these goalsare the Random Sample Consensus (RANSAC) method and the Least Median Squares(LMedS) method.

Robust estimation methods

RANSAC (Random Sample Consensus) is a resampling technique that generates candidatesolutions for a problem by using subsets of the minimum number of data points required toestimate the underlying model parameters [59]. This minimum amount of correspondencesrequired depends on the calculation method used, so when using a 7-point fundamentalmatrix calculation algorithm for example, subsets of seven correspondences are used. TheRANSAC technique randomly selects a number of these minimal subsets of point corre-spondences to determine the fundamental matrix for each subset, and then finds the best

19


fundamental matrix from those that is most consistent with the entire set of point corre-spondences [60]. Unlike conventional sampling techniques that use as much of the data aspossible to obtain an initial solution and then proceed to prune outliers, RANSAC usesthe smallest set possible and proceeds to enlarge this set with consistent data points [59],[61]. Using this resampling technique, a RANSAC algorithm can find both a solution thatagrees with as much data points as possible as well as remove all outliers at the same time.

LMedS (Least Median Squares) is a resampling technique similar to RANSAC exceptfor the way in which it determines the best solution. LMedS evaluates the solution ofeach subset in terms of the median of the squared errors of the entire data set. Errorsare quantified by the distance of a point to its corresponding epipolar line [26]. A LMedSalgorithm chooses the solution which minimises this median error of the symmetric epipolardistances of the data set [62].

Both RANSAC and LMedS perform badly when the number of outliers gets above 50%,so finding matches of a good quality is still important when using a resampling technique[63].

Quality metrics

There exist multiple error criteria that can be used as quality metrics for the estimation ofa fundamental matrix. The most widely used error criteria are the reprojection error, thesymmetric epipolar distance, and the Sampson distance [64].

• The reprojection error is regarded as the gold standard criterion. The reprojectionerror for corresponding points x and x′ is defined as the minimum distance neededto bring the 4-dimensional vector (x, x′) into perfect correspondence with the hyper-surface implicitly defined by equation (2.2) [24], [64]. The reprojection error canbe calculated using Hartley-Sturm triangulation which involves finding the roots ofa polynomial of degree 6, making the reprojection error expensive to compute [64],[65].

• The symmetric epipolar distance is an error criterion that measures the geometricdistance of each point to the epipolar line of its corresponding point. This is the mostwidely used error criterion because it is intuitive and easy to compute [64]. However,it was proven in [64] that the symmetric epipolar distance error provides a biasedestimate of the gold standard criterion, the reprojection error.

20


• The Sampson distance provides a first-order approximation of the reprojection error.The Sampson error for corresponding points x and x′ is defined as the distancebetween the 4-dimensional vector (x, x′) and its Sampson correction. The Sampsonerror provides a good estimation for the reprojection error for relatively small errors,i.e. reprojection errors smaller than 100 pixels [64].

2.2.5 Projective Rectification

Once the epipolar geometry has been recovered through the fundamental matrix, it can beused to calculate a projective transformationH that maps the epipole of an image to a pointat infinity [24]. The transformation matrix for this rectification transformation is differentfor both views of an image-pair. Generally, H is used as the name of the rectificationmatrix for the left view, while H ′ is used as the name of the rectification matrix for theright view. By mapping the epipoles to a point at infinity, the transformations H and H ′

will transform a pair of images so that all corresponding epipolar lines become horizontaland at the same height [23], [24].

The projective transformations introduce distortions in the rectified images, more specif-ically skewness and aspect/scale distortions. The rectification transformations must bechosen uniquely to minimise distortions and to maintain as accurately as possible thestructure of the original images. Having as little distortions in the rectified images aspossible helps during subsequent stages, such as matching to calculate the disparity map,ensuring local areas are not unnecessarily warped. Since the rectification transformation iscompletely defined by the fundamental matrix, the rectification quality is directly relatedto the quality of the fundamental matrix [66].

The distortions introduced in the rectified images can be quantified and used as met-rics for the quality of the rectification transformation, and, since the rectification trans-formation is completely defined by the fundamental matrix, also as quality metrics for thefundamental matrix estimation [66]:

• The orthogonality of the rectification transformation is a measure for the skewnessdistortion introduced by the rectification. The orthogonality should ideally be 90◦,and can be calculated as follows [66]:

1. Specify four points a = (w/2, 0, 1)T , b = (w, h/2, 1)T , c = (w/2, h, 1)T , andd = (0, h/2, 1)T with w and h the width and height of the original image.

21


2. Transform the points a, b, c, and d according to the rectification transformationH: a = Ha, b = Hb, c = Hc, and d = Hd.

3. Define x = b− d and y = c− a.

4. The orthogonality of the rectification is given by the angle between x and y:Eo = cos−1( x·y

|x||y|).

• The aspect ratio of the rectification transformation is a measure for the aspect/scaledistortion introduced by the rectification. The aspect ratio should ideally be 1, andcan be calculated as follows [66]:

1. Specify the four corner points a = (0, 0, 1)T , b = (w, 0, 1)T , c = (w, h, 1)T , andd = (0, h, 1)T with w and h the width and height of the original image.

2. Transform the points a, b, c, and d according to the rectification transformationH: a = Ha, b = Hb, c = Hc, and d = Hd.

3. Define x = b− d and y = c− a.

4. The aspect ratio of the rectification is given by: Ea =√

xT xyT y

.

Since all points map to a horizontal epipolar line at the same height in the otherimage, corresponding scene points in both images will be located close to each other afterrectification. Any remaining disparity between matching points will be along the horizontalepipolar line, which greatly simplifies the calculations needed to retrieve the disparity mapbetween both images [23], [24].

2.2.6 Disparity Map

After transforming a pair of images by the appropriate projective rectification transfor-mation, the original situation is reduced to the epipolar geometry produced by a pair ofidentical cameras placed side by side with their principal axes parallel. Because of thisrectification transformation, the search for matching points is vastly simplified by the sim-ple epipolar structure and the near-correspondence of the two images [24]. This allowsa disparity map to be calculated between both images. A disparity map represents thedistance between corresponding pixels that are horizontally shifted between the left imageand right image [67]. Figure 2.7 shows one of the ground truth disparity maps that is partof the Tsukuba dataset.

22


Figure 2.7: One of the disparity maps from the Tsukuba dataset [68].

In general, stereo vision disparity map algorithms can be classified into two categoriesbased on their approach to computing their disparity values:

• Algorithms using a local approach, also known as area-based or window-based ap-proaches. In these algorithms, the disparity computation at a given point dependsonly on the intensity values within a predefined support window. Thus, these methodsconsider only local information and as such have a lower computational complexityand a shorter run time. For each pixel in the image, the corresponding disparityvalue with the minimum cost is assigned. The matching cost is aggregated via a sumor an average over the support window [67].

• Algorithms using a global approach treat disparity calculations as a problem of min-imising a global energy function for all disparity values at once. These global energyfunctions usually consist of two terms, a data term, which penalises solutions thatare inconsistent with the target data, and a smoothness term, which enforces thepiece-wise smoothing assumption with neighbouring pixels. This smoothness term isdesigned to retain smoothness in disparity among pixels in the same region. Globalmethods produce good results but are computationally expensive and thus have alonger run time [67].

In [69], Hirschmüller describes a method he calls Semiglobal Matching, which uses amix between a local and a global approach. Hirschmüller’s Semiglobal Matching algorithmuses a pixelwise, mutual information-based matching cost for compensating differences ofinput images. Instead of minimising a true global cost function, as a global approachwould do, Semiglobal Matching performs a fast approximation of a global cost function

23


by using pathwise optimisations in all directions starting with the pixel being evaluated.Hirschmüller states that the number of paths used to approximate the global cost functionshould at least be eight, but suggests using 16 paths for providing good coverage of a 2Dimage [69].

Once a disparity value is calculated for every pixel in an image, a refinement pass canbe done over the entire disparity map. The purpose of the refinement pass is to reducenoise and improve the quality of the disparity map. Typically, the refinement step consistsof regularisation and occlusion filling or interpolation. The overall noise is reduced throughfiltering out inconsistent pixels and small variations among pixels in the disparity map. Inareas in which the disparity is unclear, occlusion filling or interpolation is used. Occludedregions are typically filled in with disparities similar to those of the background or tex-tureless areas. Occluded regions are detected by using left-right consistency checks. If thematching algorithm rejects disparities with low confidence, then an interpolation algorithmcan estimate approximations to the correct disparities based on their local neighbourhoods[67].

The disparity value for a pixel is inversely proportional to its depth value [70]. Disparityvalues can thus be associated with depth values through three-dimensional projection [67].A disparity map can thus be transformed into a depth map of the original scene, which canbe translated directly into a point cloud. The reconstructed scene, however, is determinedby its images only up to a similarity transformation. The rotation, translation, and scalingof the scene as a whole relative to the world cannot be determined by its images only [24].

24

Chapter 3

Methodology

The idea behind the proposed method is to draw upon as much as possible from existingmulti-view computer vision methods. However, there are three main differences betweenthe proposed method and more classical multi-view computer vision methods.

Firstly, the different model sheet views are formed by rotating the subject aroundits centre, while most classical computer vision problems form their different views bymoving the camera in front of the subject. The different model sheet views could alsobe interpreted as a camera rotation around the subject at a set distance, while the morecommon, classical problem could be interpreted as a camera translation at a set distance,although this translation is sometimes combined with a slight rotation.

Secondly, existing computer vision methods use a lot of texture information. Modelsheets are usually black-and-white line drawings, so they don’t contain texture information.Not only do they not contain texture information, model sheets are also very sparse whencompared to other types of images. Model sheets consist of a lot of large, empty regions,which makes the feature detection step a lot more difficult as there are less features todetect. This can be clearly seen in figure 3.1, which contains both images from a famouscomputer vision dataset as well as a typical model sheet.

Lastly, model sheets are drawn using an orthographic projection, while computer visionmethods often use perspective projections, as that is how both the human visual systemand real cameras work. This will make some parts of the proposed method a bit easier, asthere is no distortion in the model sheets caused by a perspective projection, but it willalso make other parts trickier, as a lot of existing computer vision algorithms work withperspective projections.

25

METHODOLOGY

(a) Amore classical computer vision dataset (the Tsukuba UniversityDataset [68]).

(b) Model sheets used in the production of the movie Atlantis: TheLost Empire.© 2001 The Walt Disney Company.

Figure 3.1: A comparison between more classical computer vision datasets and model sheets

26

METHODOLOGY 3.1 Overview

3.1 Overview

Figure 3.2 shows an overview of the method proposed in this dissertation. The proposedmethod requires at least three different model sheet views of the same model, all at differentangles, but these views do not necessarily need to be equiangular. Pairs are created fromthese model sheet views, each pair containing two adjacent angles. For each of those image-pairs, the underlying epipolar geometry is recovered through the fundamental matrix andused to reconstruct a disparity map for each view in that pair. After this, all disparitymaps for the same view that were generated in different image-pairs are combined to geta better disparity map for that view. This disparity map is then converted to a 3D pointcloud. If desired, the final point clouds could then be merged together and a convex hullcould be generated for the combined point cloud to get a full 3D object.

This chapter will go through each of these steps, starting with the model sheet require-ments, explain how each step works and what data is used to communicate between thedifferent steps. Finally, an overview will be given of the different frameworks and softwarethat were used while building a prototype implementing the proposed method.

3.2 Model Sheets

In order to be able to evaluate the proposed method, it would be useful to generate custommodel sheets. When using generated model sheets, the reconstructed 3D model can becompared to the original 3D model used to generate the model sheets in order to evaluatethe proposed method.

The way 3D objects were rendered to generate the model sheets used in this thesis, isby using the "Freestyle" function in Blender 2.79’s Cycles renderer. A black Freestyle linewith an absolute thickness of 1.5 pixels and a crease angle left at the default 134.43◦ wascombined with a completely white object material with its emission strength set to 1 so the3D model’s faces would be uniformly white, mimicking the typical black-and-white line-artstyle of real model sheet drawings. Since model sheets are drawn without perspective [71],an orthographic camera was used with scale 7.3, placed at a horizontal (XY ) distance of8 units from the centre of the model. The camera is always rotated to look straight at thecentre of the model. The camera’s near and far clipping planes were left at 0.1 and 100.0,the default Blender values. The renderer and camera settings were chosen by visually

27

METHODOLOGY 3.2 Model Sheets

Corner Detection and Matching

Image Rectification

Disparity Map

Epipolar Geometry

Quality metrics

Point Cloud

Convex Hull

Quality metrics

Model Sheets

Corner matches per image-pair

Fundamental matrices per image-pair

Model sheets

Rectified model sheets

Disparity maps per model sheet

Point cloud

Figure 3.2: The different components of the proposed method as well as the input and output ateach step, and the feedback loops that can be used to optimise an implementation

comparing the renders to actual model sheets, and by trying to make them as visuallysimilar as possible. The model sheet views are rendered with a colour depth of 8 bits,no compression, and at a resolution of 1920 by 1080 pixels but scaled down by 50% afterrendering, so the final images have a resolution of 960 by 540. The images are scaled downafter rendering to reduce memory requirements during execution of the method prototypeand to reduce the effect of aliasing in the final images. Model sheet renders were generatedfor a set of simple 3D objects with the camera rotating around the models in angles from0 to 90 degrees with increments of 5 degrees. As illustrated in figure 3.3, the model sheetsgenerated in this manner visually resemble actual model sheets.

In order to be able to compare generated model sheets to real model sheets, a modelsheet resembling the one in figure 1.2 was recreated using this method. The generated

28


Figure 3.3: Blender model sheet-like renders for a very basic 3D character.

model sheet can be seen in figure 3.4 and uses a slightly different, but similar, modelto the model sheets in figure 1.2. In order to make the comparison a bit easier, thegenerated model sheet uses two different colours, similar to those in figure 1.2, instead ofthe pure black-and-white model sheets that will be used in the remainder of this thesis. Bycomparing figures 1.2 and 3.4, it is clear that the hand-drawn model sheets contain moredetails and – in this case – also some slight shading, which is missing from the generatedmodel sheets. The shading and colouring in figure 1.2 is not present in most model sheets,though, as can be seen in figures 1.1a and 3.1b.

Figure 3.4: A recreation of the model sheets from figure 1.2, albeit using a slightly differentmesh.

29


For each model sheet view, the corresponding ground-truth disparity map was renderedwith the same camera settings, the same camera position, and at the same resolution. Sincethe generated model views use orthographic projections, and since the projected disparityis inversely proportionate to the depth of the 3D model (see section 2.2.6), disparity mapscan be generated by inverting depth maps of the models. The disparity map was generatedby using Blender’s compositing tools to map the normalised depth of the model to a linearwhite-to-black gradient, as can be seen in figure 3.6. To reduce visual artefacts in thedepth map caused by limitations of Blender’s compositing tools, an alpha scissor was usedto remove all pixels with an alpha value below 0.9. The colour profile for the depth maprenders was set to "None" instead of Blender’s default sRGB profile, to make the depth mapimage linear in its grayscale value instead of logarithmic. One of the resulting ground-truthdepth maps can be found in figure 3.5.

Figure 3.5: One of the ground-truth depth maps rendered with Blender 2.79.

Hand-drawn model sheets will need to be cleaned up before they can be used in theproposed method. They should no longer contain pencil markings and measurements suchas those on figure 3.1b. The subject should be in the exact same pose and should havethe exact same expression in each view, which isn’t always the case in model sheets usedin stylistically animated movies. Inconsistent regions in one of the model sheet views canthrow off the epipolar geometry and the disparity map stages of the proposed method.

Since the prototype for the proposed method will be using OpenCV’s implementationsfor the necessary computer vision algorithms, the model sheets in the prototype first neededto be converted to 8-bit grayscale images before the proposed method could be applied tothem as a lot of OpenCV’s implementations only work with 8-bit grayscale images. For

30

METHODOLOGY 3.3 Corner Detection and Matching

Figure 3.6: Blender’s node-based compositing tool as it was set up to generate the ground-truthdepth maps.

most model sheets this doesn’t change anything visually since they were already consistingsolely of black and white. Depending on the file format of the original model sheet images,this conversion might decrease the bit depth of the image, but this conversion won’t havean effect on the result of the proposed method.

Moreels found in [41] that exponentially less features can be matched between views ofthe same scene the more the rotation angle between those views changes, with most algo-rithms having trouble matching features for viewpoint changes beyond 30◦, as illustrated infigure 3.7. Gao found in [63] that algorithms have a hard time recovering a good estimationfor the epipolar geometry when the percentage of features discarded as outliers gets near,or above, 50%. Using Gao’s findings in combination with Moreels’ findings illustrated infigure 3.7, it is clear that model sheets should ideally not be rotated more than 10◦ betweenadjacent views.

3.3 Corner Detection and Matching

As described in section 2.2, the first step of any computer vision method is to find as muchcorresponding corners between the different views as possible. These correspondences willthen be used to determine the fundamental matrix describing the epipolar geometry thatis underlying these views. In more classic multi-view computer vision problems, there are

31

METHODOLOGY 3.3 Corner Detection and Matching

Figure 3.7: Performance of different feature detectors for different viewpoint changes. Graphreproduced from [41].

usually an abundance of features to detect and match. But, as discussed at the start ofthis chapter, model sheets contain no texture information that can be used by the featuredetector. Model sheets only contain edges and large, blank spaces which makes the featuredetection stage a lot harder for model sheets.

As the proposed method will use a 7-point RANSAC algorithm to calculate an estima-tion for the fundamental matrix between each image-pair, we’ll need at least 7 correspond-ing points between each pair of adjacent views. As explained in more detail in section2.2.4, 7 point correspondences is the bare minimum necessary to make a 7-point RANSACalgorithm work but adding more correspondences is desired since it will allow the RANSACalgorithm more room for testing out different combinations of correspondences and thusdifferent fundamental matrices, increasing the quality of the final estimated fundamentalmatrix.

Of the corner detection algorithms described in section 2.2.2, the Shi-Tomasi detectorworks best with model sheets since it doesn’t use a lot of texture information when detectingcorners and produces the most stable result. The performance of the Shi-Tomasi detectorcompared to other corner detection algorithms when used on model sheets will be discussedlater in section 4.1.

After a list of features is found for each view, these features need to be matched betweenadjacent image-pairs. In order to be able to match features between images, the regionaround every feature in every view must first be analysed using a feature descriptor, asexplained in section 2.2.2. The proposed method uses the descriptor used in the ORB

32

METHODOLOGY 3.4 Epipolar Geometry

algorithm, which is actually a modified version of the BRIEF descriptor called rBRIEF(rotated BRIEF ) [47]. The ORB descriptor works by computing binary strings for eachfeature by comparing the intensities of pairs of points around that feature. This methodis not only very fast, but also performs well on model sheets, as will be shown later in theevaluation of this stage in section 4.1.

After the descriptors have been found for every feature detected in every view, a featurematching algorithm must be used to match features between views. Because the amount offeatures that can be reliably found in most model sheets is relatively low (in the magnitudeof tens to a couple hundreds), the proposed method uses a brute-force matching algorithmusing cross-checking to increase the robustness of the feature matches as described in section2.2.3. Because ORB descriptors were used, the brute-force matching algorithm should usethe Hamming distance to calculate the distance between two features [47].

Once a list of matching point correspondences between all adjacent model sheets hasbeen obtained using the Shi-Tomasi detector combined with ORB descriptors and a brute-force, cross-checking matching algorithm, those points can be used to calculate an estima-tion of the fundamental matrix for each pair of views. These fundamental matrices willdefine the underlying epipolar geometry for each image-pair.

3.4 Epipolar Geometry

To calculate an estimate of the fundamental matrix for each pair, this method proposes touse a 7-point RANSAC algorithm. This way, the calculated fundamental matrix – if found– should be a pretty good estimation of the fundamental matrix underlying the calculatedset of correspondences.

As described in section 2.2.4, there exist a couple of different error criteria than can bederived from the fundamental matrix at this point. The mean of any of these error criteriaover all point correspondences can be used as a metric for the quality of the calculatedfundamental matrix estimation. As will be shown later in section 4.2, the proposed methoduses the mean Sampson distance as a quality metric for the fundamental matrix estimation,among others. If the mean Sampson distance is above a certain value (around 100 pixelsseems to be a good error threshold), the corner detection and matching stage, and allfollowing stages, should be rerun with slightly altered parameters. By repeating this processuntil the quality metric passes the chosen threshold, an implementation of the proposed

33

METHODOLOGY 3.5 Image Rectification

method could use this technique to optimise the method for the specified model sheetswith each iteration. If no good fundamental matrix estimation can be found after a setnumber of iterations, the best result could be used for the remainder of the method, or thequality of the model sheets could be re-evaluated on the criteria described in section 3.2.

3.5 Image Rectification

Once the epipolar geometry has been recovered for each image-pair through a fundamentalmatrix estimation, this information can be used to determine projective transformationsH1 through Hn and H ′1 through H ′n, where n is the amount of image-pairs evaluated. Theseprojective transformations will rectify all model sheets pairs as described in section 2.2.5.In each pair of rectified images, all corresponding epipolar lines will become completelyhorizontal and at the same height in both images. This transformation will make it a loteasier to calculate the disparity map in the next step. Figure 3.8 shows two rectified modelsheets and their rectified point correspondences.

Figure 3.8: A rectified image-pair with its rectified matches.

After each image-pair has been rectified, it is possible to derive two new quality metricsfrom this transformation in the form of the orthogonality, and the aspect ratio of therectification transformation, as described in more detail in 2.2.5. Next to the qualityof the transformation itself, these metrics also quantify the quality of the original pointcorrespondences, and of the fundamental matrix estimation. Both metrics will be usedlater on in section 4.3. Similar to the quality metrics derived in the previous stage of theproposed method, these metrics could also be used to optimise an implementation of theproposed method for the specified model sheets by performing a form of gradient descent,similar to that described at the end of section 3.4.

34

METHODOLOGY 3.6 Disparity Map

3.6 Disparity Map

Once each image-pair has been rectified, the rectified model sheets can be used to calculatetwo disparity maps for each image-pair. A disparity map shows the distance between pixelsin one image of a pair to the pixels that represent the same point in the pictured 3D scenein the other image. Disparity maps are described in more detail in section 2.2.6. For eachimage-pair, one disparity map can be calculated for the disparity from the left to the rightview, and one can be calculated for the disparity from the right to the left view. Since eachmodel sheet is part of two different image-pairs (except for the left-most and right-mostviews), this means that for each model sheet two different disparity maps can be calculated.The disparity maps representing the same model sheet view can be combined by taking themean of both. As will be shown later in section 4.4, the quality of the combined disparitymap is higher than the quality of both original disparity maps separately.

The proposed method uses Hirschmüller’s Semiglobal Matching algorithm, describedin section 2.2.6. Semiglobal Matching is used with some minor computation optimisationsand in combination with a few of the pre- and postfiltering methods described in thesame section. The optimisations consist of making the algorithm single-pass and thus onlyconsidering five directions instead of eight, matching blocks instead of individual pixels,and using a simpler cost metric than the one described by Hirschmüller. Before calculatingthe disparity map, a Sobel operation is used on the rectified images and the resultingderivative is clipped to a set interval to reduce noise. After different disparity values havebeen calculated for a pixel, a uniqueness check is done. The uniqueness check definesa margin by which the minimum cost function should "win" from the second-best valuebefore it is validated. If a pixel fails the uniqueness check, its disparity value is set tozero and the quadratic interpolation step will later attempt to fill it in based on the finaldisparity values of its neighbours. After all pixels have been evaluated with this uniquenesscheck, different post-filtering methods are applied to the disparity values in a refinementpass. The applied post-filtering methods consist of quadratic interpolation to fill in areasthat are missing disparity values, either because of occlusions or because they failed theuniqueness check, and speckle filtering to make the disparity map more smooth and filterout inconsistent pixels.

Since the resulting disparity maps are calculated from the rectified model sheets, theyneed to be derectified to get the disparity maps for the original model sheets. This can bedone easily by applying the inverse of the transformations H1 - Hn and H ′1 - H ′n that were

35

METHODOLOGY 3.7 Point Cloud and Convex Hull

used in the rectification step to each respective disparity map. The final disparity mapgenerated with this method is show in figure 3.9. Figure 3.10 contains the disparity mapgenerated for the real model sheets shown in figure 1.2.

Figure 3.9: A disparity map retrieved with the proposed method.

3.7 Point Cloud and Convex Hull

The disparity value for each pixel can be associated with depth values through three-dimensional projection. This allows us to transform each disparity map into a point cloud.

The grayscale value of each pixel in each disparity map corresponds to the relativedisparity of the feature that pixel is a part of between the model sheet views that wereused to create that disparity map. This means that the value of every pixel in a disparitymap is proportional to the depth of that point, as explained in section 2.2.6. The disparitymap can be transformed to a point cloud for each corresponding model sheet by takingthe multiplical inverse of each disparity value at each pixel. The resulting point cloudis only defined up to a similarity transformation, since it is impossible to determine therotation, translation, and scaling of the reconstructed scene compared to the world usingonly images of the scene [24].

After transforming the disparity maps to point clouds, there is a separate point cloudfor each model sheet view. Using the correspondences retrieved in the feature detectionand matching stage in section 3.3, these point clouds could all be combined into one cloudrepresenting the entire reconstructed scene. At least three noncollinear correspondences

36

METHODOLOGY 3.8 Frameworks

Figure 3.10: A disparity map generated from the model sheets shown in figure 1.2.

are needed between point clouds of adjacent views to be able to determine their orientationand position relative to each other. Two point clouds can be merged by transforming allpoints in one cloud in such a way that the corresponding points between both point cloudsoverlap, before simply combining all points of both point cloud together in one, biggerpoint cloud. If desired, it is possible to turn the generated point cloud into a triangulatedthree-dimensional mesh by using an algorithm that calculates a convex hull for that pointcloud.

3.8 Frameworks

In order to build a prototype implementing the proposed method, different applications,frameworks, and libraries were used. The most important of these was the C++ version ofOpenCV 4.2.0. Using OpenCV helps a lot when building a quick prototype implementingcomputer vision methods since a lot of methods are provided by the framework. Many ofOpenCV’s functions lack proper documentation, however, and its API can be difficult towork with since it drags along almost 20 years of legacy code, this is especially apparentwhile you’re still unfamiliar with the framework.

In order to generate the model sheet approximations used to evaluate the proposedmethod, Blender 2.79 was used. Blender is a full open-source 3D creation suite. It has a

37

METHODOLOGY 3.8 Frameworks

big online community, which helps a lot when you’re looking for something very specific.Blender 2.79’s user interface is not very accessible, but this has since been improved upona lot with the release of Blender 2.80 and subsequent versions.

To easily view and manipulate the point clouds generated from disparity maps in theprototype of the proposed method, Meshlab 2016.12 was used. Meshlab is an open-sourcemesh processing software system that specialises in managing and processing large, unstruc-tured meshes, which makes it perfect for point clouds. At the time of writing, Meshlabhasn’t been updated in almost four years and has some issues on modern systems becauseof this, but most functionality needed in this thesis still works.

Other libraries used in the development of the prototype for the proposed methodinclude Dear ImGui 1.7 for the user interface, and SDL 4435 (combined with GLAD 0.1.29)to initialise the application window and the OpenGL surface used by Dear ImGui.

38

Chapter 4

Method Evaluation

In this chapter, each step of the proposed method will be evaluated. First, an overviewof the quality metrics that will be used in the rest of this chapter will be given. Then,the alternative algorithms and methods that have been explored before settling for themethodology described in this thesis will be briefly explained and evaluated on a set ofdifferent model sheets. This chapter will further clarify why the methods and algorithmsdescribed in the previous chapter were chosen over their alternatives. Lastly, the fullmethod described in this thesis will be evaluated on the model sheets generated in section3.2.

4.1 Corner Detection and Matching

As described in section 3.3, at least 7 point correspondences are needed between eachimage-pair, with more correct correspondences increasing the quality of the fundamentalmatrix estimation. To increase the correctness of the corner matches, additional featuredetection algorithms may be used to select corresponding corners. One of these addi-tional methods might even be manual selection. Manual feature selection was part of thefirst prototype constructed for the proposed method but was later removed in favour ofautomatic feature detection, since manually selecting more than seven features per viewquickly became tedious, and the results weren’t always perfect as it’s very hard to selectpixel-perfect point correspondences manually. Manual selection could still be used to op-timise an implementation of this method, e.g. by allowing the user to manually remove,add, or adjust the automatic feature matches.

39

METHOD EVALUATION 4.1 Corner Detection and Matching

4.1.1 Quality metrics

Since point correspondences completely define the fundamental matrix estimation calcu-lated from them (see section 2.2.4), and given that the performance of the rectificationtransformation is directly related to the quality of the fundamental matrix estimation [66],quality metrics for the point correspondences can be retrieved from both of these stages.There are three quality metrics that can be derived from the estimated fundamental matrixand two that can be derived from the rectification transformation:

The first quality metric derived from the fundamental matrix is the Sampson distancedescribed in section 2.2.4. The smaller the mean Sampson distance is over all correspon-dences, the smaller the mean reprojection error of the fundamental matrix estimation willbe. In [64], a Sampson distance of 100 pixels is used as the threshold between small andlarge errors. In this evaluation, a Sampson distance threshold is not used, but rather thedifference in Sampson distances for different methods will be compared, with a smallermean Sampson distance being preferred.

A second useful quality metric is the absolute size of the smallest eigenvalue of theestimated fundamental matrix. We know that for the fundamental matrix F , equation(2.1) holds: detF = 0. And linear algebra teaches us that the determinant of a matrixequals the multiplication of its eigenvalues. Therefore, at least one of the eigenvalues ofF must always be 0. But, since we’re not working with the actual matrix F but ratheran estimation F ′, equation (2.1) won’t always hold exactly, there might be small errors.If detF ′ = 0 is not satisfied, this can cause inconsistencies of the epipolar geometry nearthe epipoles [26]. We can quantify this error by checking the value of the smallest absoluteeigenvalue of F ′. The closer this eigenvalue is to zero, the closer detF ′ will be to zero, andthe closer our estimated fundamental matrix will be to the correct one.

The third quality metric that can be derived from the fundamental matrix is the outlierratio of the detected correspondences. The outlier ratio is defined as the ratio of theamount of matches rejected as outliers over the total amount of matched points betweentwo images. Because of the way the RANSAC algorithm works (see section 2.2.4), theamount of outliers will decrease if the quality of the matched points increases. The morepoint correspondences that match exactly, the more correspondences will agree with thecalculated fundamental matrix estimation, and the less outliers there will be. As describedin section 2.2.4, the number of outliers should be below 50% to be able to retrieve a reliableestimation for the fundamental matrix.

40


There are two quality metrics that can be derived from the rectification transformation,Eo and Ea. As described in more detail in section 2.2.5, Eo represents distortions in theorthogonality of the images caused by the rectification transformation, while Ea representsdistortions in the aspect ratio/scale of the images caused by the rectification transforma-tion. If the rectification transformation introduced no distortion errors, Eo would be 90◦,while Ea would be 1. Distortions introduced by the rectification transformation can bothincrease or decrease the values of Eo and Ea, which is why these metrics will be expressedas absolute values relative to their ideal values in this evaluation, i.e. relative to 90◦ forEo, and relative to 1 for Ea.

Detector ClassMean Sampsondistance

Smallesteigenvalue

Outlierratio

FAST Template based 53.4 2.2e-5 17/80 (21.3%)ORB Template based 935.4 2.0e-9 38/194 (19.6%)Harris Gradient based 0.1 1.5e-8 1/33 (3.0%)Shi-Tomasi Gradient based 0.3 1.9e-9 1/35 (2.9%)SURF Region detector 323.8 2.3e-4 15/109 (13.8%)

Table 4.1: A comparison of the performance of different feature detectors on model sheets usingthe ORB descriptor and a brute-force matching algorithm.

Table 4.1 shows an overview of the quality metrics retrieved from the fundamentalmatrix for the specified detectors using an ORB descriptor and a brute-force matchingalgorithm. As described earlier, all three of these metrics should be low to assure thata reliable estimation can be found for the fundamental matrix. It is clear from table4.1 that the gradient based detectors score considerably better than the other detectorson all metrics. That the outlier ratio of the gradient based detectors is so much lowerthan that of the other detectors is by design however, as both of these algorithms containpost-processing steps to cull out the correspondences with the lowest quality.

Figure 4.1 shows the matched features for the evaluated detectors after rectification, aswell as the two quality metrics that can be derived from the rectification transformation,Eo and Ea. As described in more detail in section 2.2.5, Eo represents distortions in theorthogonality of the images caused by the rectification transformation, while Ea representsdistortions in the aspect ratio/scale of the images caused by the rectification transforma-tion. Both Eo and Ea are expressed here as the mean error relative to their ideal values,

41


(a) Using a FAST corner detector.Eo = 47.1%, Ea = 27.4%.

(b) Using an ORB corner detector.Eo = 0.6%, Ea = 0.7%.

(c) Using a Harris corner detector.Eo = 0.8%, Ea = 1.0%.

(d) Using a Shi-Tomasi corner detector.Eo = 0.4%, Ea = 0.8%.

(e) Using a SURF region detector.Eo = 5.3%, Ea = 7.6%.

(f) The original, undistorted images.

Figure 4.1: Rectified matches for one of the generated model sheets for different feature detectorsand the corresponding rectification quality metrics. The rotation angle between bothmodel sheet views is 5◦.

i.e. 90◦ for Eo and 1 for Ea. It is not always possible to retrieve the absolute mean errorfor Eo and Ea since these distortions can go in all directions, so their absolute mean errorwould appear closer to their ideal values than they actually are. It can be quickly seen infigure 4.1 that the gradient based detectors (Harris and Shi-Tomasi) and the ORB detectorscore considerably better on model sheets when compared to the other detectors for bothmetrics.

For all results in both table 4.1 and in figure 4.1, the feature detector was the onlypart of the evaluated method that changed. For these tests, the feature descriptor wasan ORB descriptor and the matching algorithm was brute-force matcher. The results ofthese tests were calculated by taking the mean value of the specified metrics over the threemodel sheet views shown in figure 4.2. The rotation angle between all adjacent views was5◦. This model sheet was chosen for the tests since it consists of four different primitives:

42


a sphere for the head, boxes for the body and legs, a cone for the hat, and a cylindricalstick across its chest. Using the model sheet in figure 4.2 thus allows us to test multipleprimitives at once. The feature descriptor and matcher will be evaluated separately laterin this chapter.

Figure 4.2: The model sheet views used in this test. From left to right, the camera rotation is40◦, 45◦, and 50◦.

For testing, the OpenCV implementations were used for all feature detectors and de-scriptors evaluated in this and the next sections. An overview of the exact implementationsused, and their parameters, can be found in table 4.3. The values of the parameters intable 4.3 were determined by performing manual iterative optimisation, iterating throughdifferent values for each parameters, optimising them for the five metrics described ear-lier. All omitted parameters were left at their default values, determined by the OpenCVframework version 4.2.0 [72].

Table 4.1 and figure 4.1 do not contain results for the SIFT detector even though thatis one of the detectors that was evaluated. The SIFT detector was omitted from the resultsin table 4.1 and figure 4.1 as it is incompatible with the ORB descriptor that was used totest the other detectors in those comparisons. The SIFT detector will still be comparedwith the two other descriptors further on in this chapter, which is why it appears in table4.3. The results from table 4.1 and figure 4.1 will now be evaluated in more detail.

4.1.2 Template based detectors

Two template based corner detectors were evaluated, FAST and ORB. The FAST detectoruses decision trees to considerably speed up corner detection, making it very useful for real-time applications [32], and a widely used corner detector [47]. However, the FAST detector

43


lacks operators that appear to be important for detecting corners on model sheets. TheFAST detector does not contain an orientation operator, which assigns an orientation toeach feature, allowing the descriptor and matcher algorithms to match features that mayhave rotated between two views. The FAST algorithm also does not produce a measure ofcornerness, which sometimes causes it to detect corners along straight edges. And lastly,the FAST detector does not contain a scaling operator, which causes it to miss some largecorners, such as three out of four corners of the body of the character in figure 4.1a, aswell as the three corners of its hat.

Compared to the other detectors shown in figure 4.1, the FAST detector clearly causesthe most distortion in the model sheets after rectification. FAST performs the worst outof all evaluated feature detectors for both rectification metrics. However, looking at thefundamental matrix metrics in table 4.1, the FAST algorithm seems to be scoring slightlybetter. FAST outperforms both the ORB and SURF detectors when comparing the meanSampson distance, but it only outperforms the SURF detector when comparing the smallesteigenvalues, and performs the worst out of all evaluated detectors when comparing theoutlier ratios. FAST is vastly outperformed by the gradient based detectors on all metrics.

The corner detector used in the ORB algorithm is a modified version of the FASTdetector, called Oriented FAST, which is a FAST detector combined with an orientationoperator, a cornerness measure, and a scaling operator [47]. As figure 4.1b shows, theadditional operators introduced in the ORB algorithm cause significantly less distortionafter rectification than the default FAST detector does. The mean orthogonality error, Eo,is decreased by almost 99% while the mean aspect ratio error, Ea, is decreased by about97%. Again, table 4.1 shows a slightly different result, the mean Sampson distance is over17 times higher for the ORB detector than it was for the FAST detector, while the smallesteigenvalue is four orders of magnitude lower. The outlier ratio decreased by 1.7%, but thisis less significant given that both outlier ratios are well below 50%.

The results in table 4.1 for both FAST and ORB are in line with what was describedin section 2.2.2 about template based corner detectors generally finding a larger numberof corners than gradient based corner detectors, but these corners also being less stableunder different imaging conditions as reflected in their outlier ratios. Overall, the ORBdetector performs better than the FAST detector for model sheets on all metrics exceptfor the mean Sampson distance.

44


4.1.3 Gradient based detectors

As described in section 2.2.2, gradient based corner detectors are usually slower and lessperformant, i.e. can find less corners, than template based detectors [27], [28]. Gradientbased corner detectors, however, generally find more stable corners under different imageconditions than template based detectors do [27], [35]. That the rectification transforma-tion calculated from the correspondences found with the Harris detector, the most populargradient based detector, are a lot more stable than that of the FAST detectors and sim-ilar to that of the ORB detector can be seen in figure 4.1c. Table 4.1 also shows thatthe Harris detector is able to find more stable corners: the relative amount of rejectedmatches is almost less than 10% that of both the ORB and FAST detectors, while themean Sampson distance is up to three orders of magnitude smaller for the Harris detectorthan for the template based detectors. The Harris detector’s smallest eigenvalue is threeorders of magnitude smaller than that of the FAST detector and one order of magnitudelarger than the smallest eigenvalue found with the ORB detector. However, the Harrisdetector evaluated here is not the exact detector described by Harris in [28]. The OpenCVimplementation adds two additional post-processing steps that slightly increase the qual-ity of the matched corners. These post-processing steps are a minimum quality level thatmatched features must reach compared to the best found quality measure, and a minimumEuclidean distance that has to be left between two detected corners [72].

The Shi-Tomasi corner detector is an improved version of the Harris detector [30].The Shi-Tomasi detector was originally developed for finding features that can be easilytracked between different frames of a video, which means that it maximises features fortheir tracking quality as opposed to their texturedness, which a lot of other corner detectorstend to focus on [30]. Because the Shi-Tomasi detector focuses less on texturedness, it alsoseems to perform very well at detecting features in sparse line drawings, such as modelsheets. Figure 4.1d shows that the Shi-Tomasi detector causes the least distortion inmodel sheets after rectification out of all evaluated detectors. Table 4.1 shows a smallesteigenvalue and outlier rate that are pretty close to those of the Harris detector, albeit bothslightly lower. The mean Sampson distance is slightly higher for the Shi-Tomasi detectorthan it is for the Harris detector, but only by 0.2 pixels.

45


4.1.4 SIFT and SURF detectors

The SIFT corner detector is a type of contour based detector [42], which are not great forwide baseline matching, as described in section 2.2.2. The SIFT detector does not appearin table 4.1 or figure 4.1 since its way of representing features is not compatible with theORB descriptor that was used in these tests to be able to compare the different detectors.In the following section, 4.1.5, the SIFT detector will be evaluated in combination withdifferent descriptors.

The SURF feature detector is a type of region detector that was inspired by SIFT [43].As described in section 2.2.2, region detectors are not suitable for retrieving the epipolargeometry between two images. However, the results for the SURF detector in table 4.1 seemdecent. The SURF detector scores somewhere in between the FAST and ORB detectorsfor the mean Sampson distance metric, and in between the template based and gradientbased detectors for the outlier ratio. The SURF detector does have the smallest eigenvalueout of all evaluated detectors. Figure 4.1e shows that the SURF detector scores somewherein between the FAST and ORB detectors when comparing the rectification distortions, butvisually shows that the SURF detector is indeed unable to correctly retrieve the epipolargeometry between both images. Some of the features matched by the SURF detector infigure 4.1e aren’t even inside the model. Even though the fundamental matrix estimationcalculated from the SURF features is a decent estimation of some fundamental matrix, asevident from table 4.1, it is not an estimation of the fundamental matrix underlying thetwo images that were being evaluated in figure 4.1e.

Even though it was clear from the related work in section 2.2.2 that both of these featuredetectors wouldn’t be good at retrieving the epipolar geometry between two model sheetviews, they were evaluated anyway since the implementation of their detectors is includedwith the implementation of their descriptors in OpenCV [72]. There was no added cost toevaluating these detectors, both in terms of additional time, as well as in terms of extraimplementation work needed.

46


Detector Descriptor MSD SEV Outlier ratio Eo Ea

FAST SIFT 42.7 1.4e-5 21/96 (21.9%) 12.7% 10.5%FAST SURF 1142.5 9.2e-5 23/62 (37.1%) 17.9% 18.5%FAST ORB 53.4 2.2e-5 17/80 (21.3%) 47.1% 27.4%ORB SIFT 20.1 6.6e-4 66/293 (22.5%) 9.8% 10.7%ORB SURF 51.0 2.9e-5 24/218 (11.0%) 3.4% 2.5%ORB ORB 935.4 2.0e-9 38/194 (19.6%) 0.6% 0.7%Harris SIFT 0.3 4.7e-10 1/38 (2.6%) 1.1% 1.3%Harris SURF 26.3 1.2e-7 1/31 (3.2%) 2.1% 1.2%Harris ORB 0.1 1.5e-8 1/33 (3.0%) 0.8% 1.0%Shi-Tomasi SIFT 0.4 9.8e-5 3/36 (8.3%) 3.5% 4.8%Shi-Tomasi SURF 434.6 5.6e-9 1/31 (3.2%) 1.0% 1.2%Shi-Tomasi ORB 0.3 1.9e-9 1/35 (2.9%) 0.4% 0.8%SIFT SIFT 164.2 5.1e-6 12/63 (19.0%) 1.6% 2.2%SIFT SURF 2068.1 8.6e-6 14/51 (27.5%) 5.6% 7.1%SIFT ORB N/A N/A N/A N/A N/ASURF SIFT 6.7 2.7e-3 29/130 (22.3%) 40.7% 43.0%SURF SURF 89.7 1.7e-5 18/103 (17.5%) 1.5% 1.8%SURF ORB 323.8 2.3e-4 15/109 (13.8%) 5.3% 7.6%

Table 4.2: A comparison of the performance of different corner detectors on model sheets usingdifferent descriptors. MSD = Mean Sampson distance, SEV = Smallest eigenvalue.

4.1.5 Feature descriptors

Table 4.2 shows a comparison of different feature detector and descriptor combinationswhen used on model sheets. Each combination is used with the same brute-force, cross-checking matching algorithm.

The SIFT algorithm’s descriptor is the most widely used feature descriptor in com-puter vision [27]. The SIFT descriptor uses a combination of histograms from different(sub)regions around each feature to describe features [42]. The SIFT descriptor is thebest performing descriptor in table 4.2 for the FAST and SIFT detectors for all metrics,but as was shown earlier, these are two of the worst performing detectors when used onmodel sheets. The SIFT descriptor is the worst performing descriptor in table 4.2 for the

47


ORB and SURF detectors, for all metrics except for the mean Sampson distance. For bothgradient based detectors, the SIFT descriptor has a performance comparable to ORB, butslightly worse.

The SURF descriptor uses block patterns and Haar wavelet responses instead of his-tograms to approximate regions around features [43]. The SURF descriptor performs thesame or worse than SIFT for all feature detectors in table 4.2, except for the SURF detectoritself.

The ORB descriptor is a modified version of the BRIEF descriptor called rBRIEF(rotated BRIEF ), which works by computing binary strings for every feature by comparingthe intensities of pairs of points around that feature [47]. As can be seen in table 4.2, theORB descriptor performs the same or better than the SIFT and SURF descriptors for theORB, Harris, and Shi-Tomasi detectors.

By far, the best performing combinations in table 4.2 are the combinations of theHarris and Shi-Tomasi detectors with the SIFT and ORB descriptors. Out of these fourcombinations, the Shi-Tomasi/ORB combination is the one that was used throughout theproposed method since it scores either the best or second best for all five metrics in table4.2. However, replacing the Shi-Tomasi/ORB combination with any of the three othercombinations should still provide good results using the proposed methodology.

4.1.6 Corner matching

Since the number of corners detected in model sheet views is generally low, especially whenusing the Shi-Tomasi detector, corner matching can be done using a brute-force matchingalgorithm. The algorithm used in the development of a prototype for the proposed methodwas OpenCV’s BFMatcher, using the Hamming norm since that is the norm preferred bythe ORB descriptor, and using cross-checking. As shown in table 4.4, disabling the cross-checking match validity test slightly increases both the number of matched features and theoutlier rate, while increasing the mean Sampson distance by a couple orders of magnitude,indicating that the matched features without cross-checking are of a significantly lowerquality. The rectification distortion metrics remain the same or improve slightly whendisabling cross-checking, but these values are in both cases already very low so this changeis less significant. OpenCV’s BFMatcher implementation has no parameters to compareother than the norm to use and whether or not to use cross-checking [72].

48


Algorithm OpenCV implementation ParametersFAST FastFeatureDetector Threshold: 38ORB ORB Max. features: 450

Pyramid scale factor: 1.125Number of pyramid levels: 14First pyramid level: 0Patch size: 37Edge threshold: 37FAST threshold: 6

Harris GFTTDetector(with useHarrisDetector=true)

Max. corners: 1000Quality level: 0.011Min. distance: 3Block size: 6Gradient size: 7k: 0.04

Shi-Tomasi GFTTDetector(with useHarrisDetector=false)

Max. corners: 1000Quality level: 0.1Min. distance: 2Block size: 11Gradient size: 7

SIFT SIFT Max. features: 1000Octave layers: 3Contrast threshold: 0.15Edge threshold: 22Sigma: 1.6

SURF SURF Hessian threshold: 1000Number of octaves: 2Octave layers: 4

Table 4.3: The different feature detectors and descriptors that were evaluated, the name of theirOpenCV implementations, and the parameters that were used during the evaluations.

49

METHOD EVALUATION 4.2 Epipolar Geometry

Quality metric Cross-checking enabled Cross-checking disabledMean Sampson distance 0.3 pixels 129.1 pixelsSmallest eigenvalue 1.9e-9 8.4e-9Outlier ratio 1/35 (2.9%) 5/44 (11.4%)Eo 0.4% 0.4%Ea 0.8% 0.7%

Table 4.4: A comparison of the difference in quality metrics when enabling or disabling cross-checking in the brute-force matcher for the Shi-Tomasi/ORB combination.

4.2 Epipolar Geometry

In order to calculate an estimation for the fundamental matrix, a 7-point RANSAC algo-rithm was used. This RANSAC algorithm has two parameters that influence its perfor-mance, namely the reprojection threshold, which specifies the maximum distance betweena point and the epipolar line of its matched corresponding point beyond which the pointis considered an outlier, and the desired confidence level (or probability) that the esti-mated fundamental matrix is correct [72]. In the prototype for the proposed methodology,OpenCV’s findFundamentalMat implementation is used with the reprojection thresholdset to 2 pixels, and the confidence level set to 99%. Both of these parameters have beendetermined using a similar form of iterative optimisation as was used when determiningthe parameters for the detectors and descriptors in section 4.1. Figure 4.3 shows graphsplotting the mean Sampson distance, the outlier ratio, and both rectification metrics forvarying values of the reprojection threshold (figure 4.3a) and confidence level (figure 4.3b).

The three metrics derived from the fundamental matrix estimation – the mean Sampsondistance, the minimum eigenvalue, and the outlier ratio – can be used to give the systemfeedback about the quality of the correspondences that were matched in the previous stageby examining the fundamental matrix estimation. The feedback from these three metricscan then be used to redo the feature matching step with different parameters, hoping toget a better result before continuing on with the next step in the process.

50

METHOD EVALUATION 4.3 Rectification

(a) Metrics in function of the reprojection threshold.

(b) Metrics in function of the confidence level.

Figure 4.3: Different quality metrics in function of the parameters of the 7-point RANSACalgorithm.

4.3 Rectification

Similar to the quality metrics derived from the fundamental matrix, the two metrics derivedfrom the rectification transformation, Eo and Ea, can be used to give the system feedbackabout the quality of the matched point correspondences. Calculating the rectificationmatrix is very quick, so the metrics derived from the rectification transformation couldbe combined with the metrics derived from the fundamental matrix, thus optimising thesystem for all five metrics at once.

51

METHOD EVALUATION 4.4 Disparity Map

4.4 Disparity Map

In the proposed method, the disparity map is calculated using OpenCV’s StereoSGBMimplementation, which is an adaptation of Hirschmüller’s Semiglobal Matching algorithmdescribed in section 2.2.6. The adaptations that were done in the OpenCV implementationare computational optimisation as well as pre- and postfiltering methods, described in moredetail in section 3.6.

Before settling for Hirschmüller’s Semiglobal Matching, a local-approach block matcherwas used, but the results of this were not great. Figure 4.4 shows a typical disparity mapas it is calculated by a local-approach block matcher when evaluating model sheets. Thelocal-approach matcher is able to retrieve the lines of the model sheet in figure 4.4, aswell as the surface of the relatively small stick, but cannot retrieve the empty spaces inbetween. Local matchers have trouble filling in gaps that don’t contain textures and thatare larger than their block size. These types of large, empty gaps appear a lot in modelsheets, hence the poor performance of local block matchers on model sheets.

Figure 4.4: A typical disparity map found with local-approach block matchers on model sheets.The spaces between lines are empty.

Figure 4.5 shows the different methods that were applied to try to improve the qualityof local-approach methods. The first of these methods was to apply a static noise tothe background of both model sheet views to help the disparity algorithm to determinethe location vertices between views. As can be seen in figure 4.5b, this did not workhowever. The local-approach methods simply try to match the backgrounds together.Another method that was considered was to apply a procedural texture to the model

52


sheet itself before executing the proposed method. Before actual procedural textures wereimplemented, this idea was tested by adding busy textures to the generated model sheets.The results of this method can be seen in figure 4.5d, but once again, the local approachsimply tries to match the lines in the textures themselves, and is unable to fill in the gapsin between. Finally, a flood fill method was implemented to try to average out the disparityvalues, but while this improve the disparity map for completely flat objects such as a cube,it did not work on more complex model sheets, such as the one shown in figure 4.5e.

In order to evaluate and optimise the performance of the disparity map stage, a qualitymetric is needed that compares a disparity map to one of the generated ground truthimages from section 3.2. During the development of the prototype for the proposed method,multiple metrics were tried, including a similarity measure, both the mean of squared errorsand the mean of absolute errors, the correlation between a disparity map and its groundtruth, the correlation between the histograms of a disparity map and the histogram of itsground truth, and the Chi-Square distance between both histograms. But other than thecorrelation between a disparity map and its ground truth, none of these metrics seemedvery good at quantifying the quality of the generated disparity map. In the end, the"bad 2.0" metric from Middlebury’s stereo dataset [73] was used in combination with thedisparity correlation. The bad 2.0 metric returns the percentage of non-occluded bad pixelswith a disparity error larger than 2.0 pixels. Listing 4.1 shows the algorithm that was usedto calculate the correlation between a disparity map and its ground truth, and listing 4.2shows the implementation of the bad 2.0 metric as it was used in the proposed method. Ahigher disparity correlation and a lower value for the bad 2.0 metric indicate better qualitydisparity maps. The disparity correlation can become negative as well, indicating that thedisparity map has been inverted, i.e. the light areas in the ground truth are calculated asdark areas and vice-versa.

Listing 4.1: Calculating the correlation between disparity maps

double c o r r e l a t i o n (Mat m1, Mat m2, Mat mask) {int n_pixels = countNonZero (mask ) ;

// Compute mean and standard d e v i a t i on o f both imagesSca la r im1_mean , im1_std , im2_mean , im2_std ;meanStdDev (m1, im1_mean , im1_std , mask ) ;meanStdDev (m2, im2_mean , im2_std , mask ) ;

53


// Computer covar iance and c o r r e l a t i o n c o e f f i c i e n t sMat m1_c, m2_c ;subt rac t (m1, im1_mean , m1_c, mask ) ;subt rac t (m2, im2_mean , m2_c, mask ) ;double covar = (m1_c ) . dot (m2_c) / n_pixels ;double c o r r e l a t i o n = covar / ( im1_std [ 0 ] ∗ im2_std [ 0 ] ) ;

return c o r r e l a t i o n ;}

Listing 4.2: Implementation of the Middlebury bad 2.0 metric

double bad20 (Mat m1, Mat m2, Mat mask ) {Mat bad_pixels ;subt rac t (m2, m1, bad_pixels , mask ) ;bad_pixels = abs ( bad_pixels ) ;

// Cu l l a l l p i x e l s wi th a d i s p a r i t y error <= 1.99th r e sho ld ( bad_pixels , bad_pixels , 1 . 99 , 255 , THRESH_BINARY) ;int n_bad_pixels = countNonZero ( bad_pixels ) ;int n_pixels = countNonZero (mask ) ;double r e s u l t = n_bad_pixels ∗ 1 .0 / n_pixels ;

return r e s u l t ;}

After running the same iterative optimisation method from the previous sections usingthese quality metrics, the parameters shown in table 4.5 were chosen. Table 4.5 shows thedefault values proposed by the OpenCV framework, as well as the values used as defaultsin the proposed method. Although these parameters should be decent starting pointsfor most model sheets, they may need to be altered slightly in order to get the optimaldisparity map for a given model sheet since these parameters depend very heavily on themodel sheets themselves. The mode of the method was left to its default mode, which usesthe single-pass five-path cost implementation, in order to reduce the memory requirementsof the proposed method.

The most remarkable change in table 4.5 is probably the big block size of 37 pixels

54


Parameter Default value Proposed valueMin. disparity 0 0Number of disparities 16 32Block size 5 37P1 2 970P2 5 5970Max. left-right disparity difference 1 14Pre-filter cap 1 15Uniqueness ratio 10 0Speckle window size 0 200Speckle range 0 2

Table 4.5: A comparison of OpenCV’s default Semiglobal Matching parameters and the param-eters proposed for use on model sheets.

instead of OpenCV’s default 5 pixel block size. The benefit of using a large block size likethis is that the disparity map will be more smooth [72] and that the algorithm will be ableto fill in a lot of the empty spaces in model sheets. The downside of using a large blocksize is that a lot of the smaller details can get lost, as can be seen in figure 4.7c wherethe character’s stick is almost completely lost due to the large block size, while it is stillvisible in figure 4.7b, which has a less large block size. A smaller block size is also betterat retrieving the curvature of round objects, such as the cone and sphere of the model infigure 4.7b. Larger block sizes are usually better at retrieving the general shape of a model,while smaller block sizes are better at retrieving the shapes of details in the models. Figure4.6 shows the value of the disparity correlation and the bad 2.0 metric for different blocksizes, showing that the overall quality of model sheets increases with larger block sizes, butonly up to a certain point after which it begins to decrease again.

55


Figure 4.6: The disparity quality metrics in function of the algorithm’s block size.

The final disparity map for a certain view is calculated as the mean of the disparitymap from the image-pair where that view is the right image and the disparity map fromthe image-pair where that view is the left image. For values that are occluded in one ofthe views, only the values of the other view are used. When using this method on differentmodel sheets, the bad2.0 metric is almost always at least as low or lower for the meanof both disparity maps than for either disparity map separately. The correlation betweendisparities for the mean disparity map is often as high or higher than either disparity mapseparately, but less frequently than the bad2.0 metric. An example of this combinationcan be seen in figure 4.8. The biggest advantage to this technique of combining both viewsis that it can be done without needing the ground truth disparity map.

(a) Disparity map for theimage-pair where this viewis the left view.correlation: 0.40bad2.0: 0.49

(b) Disparity map for theimage-pair where this viewis the right view.correlation: 0.49bad2.0: 0.32

(c) The mean disparity map ofthe disparity maps in (a)and (b).correlation: 0.49bad2.0: 0.31

Figure 4.8: The left-view and right-view disparity maps, and their combination.

56


The minimum disparity parameter in table 4.5 defines the starting point of the matchingalgorithm. It is highly dependent on the quality of the rectification transformation. If therectification transformation introduces a high amount of distortions, such as those in figure4.1a, some pixels in the right view might end up with a smaller x-coordinate than that ofthe corresponding pixel in the left view and thus a negative minimum disparity would beneeded. In order to support these cases, the prototype of the proposed method contains aniterative optimisation algorithm that iterates the minimum disparity parameter through arange symmetric around zero. It tests the bad 2.0 metric at each value for the minimumdisparity and at the end selects the minimum disparity that provides the best value for themetric. Because the bad 2.0 metric is used, however, this method can only be used whenthe ground truth is available.

The disparity map and point cloud are directly related to each other, as the depth ofeach point is the inverse of its disparity value. Thus, it is not beneficial to evaluate the pointcloud itself. After the point cloud has been retrieved, a convex hull can be generated forthe point cloud using a hull generation algorithm such as Quick Hull[74]. The convex hullretrieved this way can be compared to the original 3D model used to generate the modelsheets with a metric comparing the distance between 3D models, such as the Hausdorffdistance [75].

4.4.1 Evaluating the Global Method

In order to evaluate the global method, different datasets with known ground truth wereused. Some of these models have been created specifically for evaluating the proposedmethod, while others are open-source 3D models often used in research. The model sheetsand ground truth disparity maps for all evaluated models are generated using the methoddescribed in section 3.2. Table 4.6 shows an overview of all evaluated datasets and thesource of the original 3D model.

Table 4.7 contains the results of the proposed method on the datasets described in table4.6. The angle between the model sheets used to calculate the disparity map is 5◦ for mostdatasets, except for the batman dataset where it is 15◦. Table 4.8 shows the block sizesand minimum disparity values used to retrieve the results in table 4.6.

Some interesting results can be derived from the Objects dataset, which shows thatthe proposed method can be used to reconstruct objects that are completely separate in

57


the model sheets. The disparity coverage of the lone-standing cone in 4.7 is pretty small,but this is due to the large block size. It seldom happens that a real model sheet containstwo separate models in the same view, but this is finding is interesting nonetheless.

Another interesting dataset to look at are the two Torus datasets. The torus that islying flat looks the exact same in all of its model sheets, as the camera is rotating aroundthe torus’ revolution axis. The torus that is standing on its side rotates in the differentmodel sheet views and thus changes considerably between views, yet the results are verysimilar for both datasets. The only difference is that the correlation for the Torus (flat)dataset is inverted, which indicates that the method has switched the front and back sidesof the model.

The two best reconstructions in table 4.7 are Stanford Bunny and Utah Teapot. Themodel sheets for both of these models, visible in table 4.6,

58


(a) Applying static background noise to model sheets (b) Disparity map for (a)

(c) Applying procedural textures to the model sheets (d) Disparity map for (c)

(e) Applying flood-fill to average out empty areas.

Figure 4.5: Some of the methods that were tried unsuccessfully to improve the quality of local-approach matchers on model sheets.

59


(a) The ground truth.

(b) Disparity map generated with block size 21.correlation: 0.11, bad2.0: 0.72

(c) Disparity map generated with block size 37.correlation: 0.50, bad2.0: 0.30

Figure 4.7: Disparity maps generated with large and less-large block sizes, as well as their groundtruth disparity map.

60


Dataset Model sheet Ground truth 3D Model SourceBatman Custom model

Stanford Bunny Stanford University [76]

Cube Custom model

Dude Custom model

Objects Custom model

Suzanne Included in Blender 2.79

Torus (flat) Custom model

Torus (side) Custom model

Utah Teapot Included in Blender 2.79

Table 4.6: The datasets used to evaluate the global method.

61


Dataset Ground truth Mean disparity map Quality metrics

BatmanBad2.0: 0.58Correlation: -0.27

Stanford BunnyBad2.0: 0.09Correlation: 0.23

CubeBad2.0: 0.29Correlation: 0.54

DudeBad2.0: 0.31Correlation: 0.49

ObjectsBad2.0: 0.50Correlation: 0.11

SuzanneBad2.0: 0.47Correlation: -0.07

Torus (flat)Bad2.0: 0.48Correlation: -0.22

Torus (side)Bad2.0: 0.45Correlation: 0.22

Utah TeapotBad2.0: 0.13Correlation: 0.55

Table 4.7: The reconstructed disparity maps for all evaluated datasets.

62


Dataset Block size Minimum disparityBatman 15 12Stanford Bunny 7 -18Cube 37 -26Dude 37 0Objects 23 14Suzanne 37 0Torus (flat) 37 0Torus (side) 25 -17Utah Teapot 15 -25

Table 4.8: The parameters used to generate the results in table 4.7.

63

Chapter 5

Conclusion

This goal of this thesis was to provide a better starting point for 3D modellers thanthe often-used default cube. Whether or not this goal was accomplished depends on theused definition of a "better" starting point. The proposed method shows some promisingresults when used on model sheets, but the final models are currently not yet usablein production environments. With a lot of manual tweaking, it is possible to get somegood approximations though, as was shown for the Utah Teapot and the Stanford Bunnydatasets.

The method used to automatically generate model sheets provides some nice lookingresults. This method is clearly not usable in production environments to generate modelsheets, as you would need the 3D model to generate the model sheet, but it could, forexample, be used as a stylistic choice in a 3D production such as a video game or animatedmovie. The method used to generate model sheets could also be used in further researchregarding model sheets and computer vision, as the availability of a ground truth disparitymap is absolutely necessary to evaluate the generated 3D models. Some pointers towardspossible areas of further research regarding model sheets and computer vision will be givenin the next and final chapter.

64

Chapter 6

Future Work

6.1 Possible Alternatives

In this section, some possible alternative ways to generate (the basis for) 3D models froma model sheet will be briefly discussed. Some of these were considered before settling onthe method proposed in this thesis, others were thought of while working on the proposedmethod, but all of them might provide interesting alternatives to the method described inthis thesis.

Firstly, it might be possible to create at least a good basis for the 3D modellers to workon by sculpting a voxel model on top of the different model sheet views. Using an infinitevoxel tree, you could keep projecting the current voxel model on each of the views andchip away all voxels that fall completely outside of the outline on the model sheets, beforemoving on to a finer resolution. Once the desired resolution is reached, a marching cubesalgorithm could be used to make the model more smooth. This method will probably losea lot of details that fall inside the outline of the model on the model sheets, but it couldprovide at least a good basis for 3D modellers to further refine the model.

It might also be interesting to try to use some form of machine learning to generate these3D models. You could train an algorithm to recognise certain body parts, for example,and then learn it how to translate those parts into 3D models. The main drawback fora machine learning method, is that the final algorithm will heavily depend on the set oftraining data that was used. This might be less of a problem if the training data is providedby the 3D modellers themselves, and if they’d only want to use it to generate a specificset of 3D models (e.g. guns for a first-person-shooter game). This would be comparable to

65

FUTURE WORK 6.2 Possible Extensions and Adaptations

the template-based modelling discussed in section 2.1.1 but with a self-learning databaseof templates.

Lastly, another possible alternative to generate 3D models based on model sheets, isby using a template library with an algorithm that can try to match different 3D shapesonto the model sheet. This subject was briefly touched upon in section 2.1.1, but thesealgorithms seem to have never been tested on model sheets or other, similar, kinds ofimages.

6.2 Possible Extensions and Adaptations

In this section,some possible extensions and adaptations that could be made to the methodproposed in this thesis will be discussed briefly. These are mainly ideas for things thatmight provide interesting results in future research.

Since the different correspondences are selected on the model sheet views themselves, itshould be pretty trivial to use these to extract a set of UV values for the correspondencesin one of the views. If the method described in this thesis was further used to create onlythe outline of the mesh instead of the entire model, this (probably low-poly) mesh couldbe combined with the retrieved UV values to map the different model sheet views onto themesh. It might also be possible to use the disparity map to determine the (approximate)normals of the model and combine this with the texture-mapped model to create a low-poly version of the model that might just provide enough illusion of depth for certainapplications. Instead of using it to generate a normal map, the disparity map could also beused to deform the simple model directly, making it more complex. Such a method mightprovide a good starting point for 3D modellers to further refine the model.

Instead of calculating the disparity map for as much pixels as possible, there also existalgorithms that use a modified version of the SIFT algorithm to only calculate disparitiesfor a number of detected features in the image [67], [77]. Using a similar method, it mightbe possible to calculate the disparities for all corners in the model sheets and use a convexhull to generate the corresponding 3D model. When using a convex hull, it’s not reallynecessary to calculate the disparities for all pixels along a straight edge anyway, so thismethod might create less noise in the final result. The downside of a method like this isthat the less vertices are generated, the more an error in one of the vertices will influencethe quality of the final model.

66

FUTURE WORK 6.2 Possible Extensions and Adaptations

As was discussed in section 4.4, different block sizes for the disparity calculation meth-ods can detect different levels of detail. Large block sizes can better detect the generalshape of the model, while smaller block sizes can better detect small details on a model.A new method could be devised that combines multiple disparity maps, generated usingdifferent block sizes, to create one disparity map that works both for large-scale details asfor the smaller-scale details.

67

Bibliography

[1] M. R. Mehta, M. Y. Bhatt, M. Y. Joshi, and M. S. Vidhani, “Animated movie makingusing a game engine”, Int. J. Emerg. Trends Sci. Technol., vol. 2, pp. 2320–2324, 2015.

[2] P. Wells, “Basics animation 03: Drawing for animation”, in. Bloomsbury Publishing,2008, vol. 3, p. 21.

[3] David dias: Batman 3d toon shader/modeling, [Online]. Available: https://www.artstation.com/artwork/X9bl0 (visited on 05/31/2020).

[4] T. Igarashi, S. Matsuoka, and H. Tanaka, “Teddy: A sketching interface for 3dfreeform design”, in Proceedings of the 26th Annual Conference on Computer Graph-ics and Interactive Techniques, ser. SIGGRAPH ’99, New York, NY, USA: ACMPress/Addison-Wesley Publishing Co., 1999, pp. 409–416, isbn: 0-201-48560-5. doi:10.1145/311535.311602. [Online]. Available: http://dx.doi.org/10.1145/311535.311602.

[5] T. Ju, “Fixing geometric errors on polygonal models: A survey”, Journal of ComputerScience and Technology, vol. 24, no. 1, pp. 19–29, 2009.

[6] A. Paquette, “Cg modeling 2: Nurbs”, in An Introduction to Computer Graphics forArtists. London: Springer London, 2013, pp. 247–277, isbn: 978-1-4471-5100-5. doi:10.1007/978-1-4471-5100-5_14. [Online]. Available: https://doi.org/10.1007/978-1-4471-5100-5_14.

[7] ——, “Modeling 1: Polygons”, in An Introduction to Computer Graphics for Artists.Springer London, 2013, pp. 69–89, isbn: 978-1-4471-5100-5. doi: 10.1007/978-1-4471-5100-5_6. [Online]. Available: https://doi.org/10.1007/978-1-4471-5100-5_6.

[8] I. Biederman, “Recognition-by-Components: A Theory of Human Image Understand-ing”, Psychological Review, vol. 94, no. 2, pp. 115–147, 1987, issn: 0033295X. doi:10.1037/0033-295X.94.2.115.

68

https://www.artstation.com/artwork/X9bl0

https://www.artstation.com/artwork/X9bl0

https://doi.org/10.1145/311535.311602

http://dx.doi.org/10.1145/311535.311602

http://dx.doi.org/10.1145/311535.311602

https://doi.org/10.1007/978-1-4471-5100-5_14

https://doi.org/10.1007/978-1-4471-5100-5_14

https://doi.org/10.1007/978-1-4471-5100-5_14

https://doi.org/10.1007/978-1-4471-5100-5_6

https://doi.org/10.1007/978-1-4471-5100-5_6

https://doi.org/10.1007/978-1-4471-5100-5_6

https://doi.org/10.1007/978-1-4471-5100-5_6

https://doi.org/10.1037/0033-295X.94.2.115

BIBLIOGRAPHY BIBLIOGRAPHY

[9] H. Barrow, J. Tenenbaum, A. Hanson, and E. Riseman, “Recovering intrinsic scenecharacteristics”, Comput. Vis. Syst, vol. 2, no. 3-26, p. 2, 1978.

[10] H. G. Barrow and J. M. Tenenbaum, “Interpreting line drawings as three-dimensionalsurfaces”, in AAAI-80 Proceedings, 1980, pp. 11–14.

[11] X. Yin, P. Wonka, and A. Razdan, “Generating 3D Building Models from Architec-tural Drawings: A Survey”, IEEE Computer Graphics and Applications, vol. 29, no. 1,pp. 20–30, 2009, issn: 0272-1716. doi: 10.1109/MCG.2009.9. [Online]. Available:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4736453.

[12] L. Olsen, F. F. Samavati, M. C. Sousa, and J. A. Jorge, “Sketch-based modeling: Asurvey”, Computers and Graphics (Pergamon), vol. 33, no. 1, pp. 85–103, 2009, issn:00978493. doi: 10.1016/j.cag.2008.09.013.

[13] D. H. Ballard and C. M. Brown, Computer Vision. Prenice-Hall, Englewood Cliffs,NJ, 1982, isbn: 0-13-165316-4. [Online]. Available: https://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/Ballard%7B%5C_%7D%7B%5C_%7DD.%7B%5C_%7Dand%7B%5C_%7DBrown%7B%5C_%7D%7B%5C_%7DC.%7B%5C_%7DM.%7B%5C_%7D%7B%5C_%7D1982%7B%5C_%7D%7B%5C_%7DComputer%7B%5C_%7DVision.pdf.

[14] R. C. Zeleznik, K. P. Herndon, and J. F. Hughes, “SKETCH: An Interface for Sketch-ing 3D Screens”, Computer Graphics Proceedings, Annual Conference Series (SIG-GRAPH ’96), pp. 163–170, 1996, issn: 0097-8930. doi: 10.1145/1281500.1281530.[Online]. Available: https://dl.acm.org/citation.cfm?id=1281530%20http://dl.acm.org/citation.cfm?doid=1281500.1281530.

[15] H. Shin and T. Igarashi, “Magic canvas: Interactive design of a 3-d scene proto-type from freehand sketches”, in Proceedings of Graphics Interface 2007, ser. GI ’07,Montreal, Canada: ACM, 2007, pp. 63–70, isbn: 978-1-56881-337-0. doi: 10.1145/1268517.1268530. [Online]. Available: http://doi.acm.org/10.1145/1268517.1268530.

[16] Asset forge by kenney, [Online]. Available: https://assetforge.io/ (visited on05/31/2020).

[17] K. P. Chansri N., “Image-Based Direct Slicing of a Single Line Drawing for RapidPrototyping”, in Innovative Developments in Virtual and Physical Prototyping: Pro-ceedings of the 5th International Conference on Advanced Research in Virtual andRapid Prototyping, 2011, pp. 241–247, isbn: 9780203181416.

69

https://doi.org/10.1109/MCG.2009.9

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4736453

https://doi.org/10.1016/j.cag.2008.09.013

https://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/Ballard%7B%5C_%7D%7B%5C_%7DD.%7B%5C_%7Dand%7B%5C_%7DBrown%7B%5C_%7D%7B%5C_%7DC.%7B%5C_%7DM.%7B%5C_%7D%7B%5C_%7D1982%7B%5C_%7D%7B%5C_%7DComputer%7B%5C_%7DVision.pdf




https://doi.org/10.1145/1281500.1281530

https://dl.acm.org/citation.cfm?id=1281530%20http://dl.acm.org/citation.cfm?doid=1281500.1281530

https://dl.acm.org/citation.cfm?id=1281530%20http://dl.acm.org/citation.cfm?doid=1281500.1281530

https://doi.org/10.1145/1268517.1268530

https://doi.org/10.1145/1268517.1268530

http://doi.acm.org/10.1145/1268517.1268530

http://doi.acm.org/10.1145/1268517.1268530

https://assetforge.io/


[18] Solidworks for 3d mechanical drawing, [Online]. Available: https://xyberpast.blogspot.com/2014/05/solidworks-for-3d-mechanical-drawing.html (visitedon 05/31/2020).

[19] D. Pugh, “Designing solid objects using interactive sketch interpretation”, in Pro-ceedings of the 1992 Symposium on Interactive 3D Graphics, ser. I3D ’92, Cam-bridge, Massachusetts, USA: ACM, 1992, pp. 117–126, isbn: 0-89791-467-8. doi:10.1145/147156.147178. [Online]. Available: http://doi.acm.org/10.1145/147156.147178.

[20] B. De Araújo and J. Jorge, “Blobmaker: Free-form modelling with variational implicitsurfaces”, in Proceedings of The IEEE - PIEEE, vol. 12, Jan. 2003, pp. 335–342. doi:10.1.1.104.8535.

[21] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparisonand evaluation of multi-view stereo reconstruction algorithms”, Proceedings of IEEEConference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 519–528, 2006, issn: 10636919. doi: 10.1109/CVPR.2006.19. arXiv: 10.1.1.62.1019.[Online]. Available: http://ieeexplore.ieee.org/xpls/abs%7B%5C_%7Dall.jsp?arnumber=1640800.

[22] L. Moisan, P. Moulon, and P. Monasse, “Fundamental matrix of a stereo pair, witha contrario elimination of outliers”, Image Processing On Line, vol. 6, pp. 89–113,2016.

[23] R. I. Hartley, “Theory and Practice of Projective Rectification”, International Journalof Computer Vision, vol. 35, no. 2, pp. 115–127, 1998, issn: 1573-1405. doi: 10.1023/A:1008115206617.

[24] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. The PittBuilding, Trumpington Street, Cambridge, United Kingdom: Cambridge UniversityPress, 2000.

[25] F. Riggi, M. Toews, and T. Arbel, “Fundamental matrix estimation via tip-transferof invariant parameters”, in 18th International Conference on Pattern Recognition(ICPR’06), IEEE, vol. 2, 2006, pp. 21–24.

[26] Z. Zhang, R. Deriche, O. Faugeras, and Q. T. Luong, “A robust technique for match-ing two uncalibrated images through the recovery of the unknown epipolar geome-try”, Artificial Intelligence, vol. 78, no. 1-2, pp. 87–119, 1995, issn: 00043702. doi:10.1016/0004-3702(95)00022-4.

70

https://xyberpast.blogspot.com/2014/05/solidworks-for-3d-mechanical-drawing.html

https://xyberpast.blogspot.com/2014/05/solidworks-for-3d-mechanical-drawing.html

https://doi.org/10.1145/147156.147178

http://doi.acm.org/10.1145/147156.147178

http://doi.acm.org/10.1145/147156.147178

https://doi.org/10.1.1.104.8535

https://doi.org/10.1109/CVPR.2006.19

https://arxiv.org/abs/10.1.1.62.1019

http://ieeexplore.ieee.org/xpls/abs%7B%5C_%7Dall.jsp?arnumber=1640800

http://ieeexplore.ieee.org/xpls/abs%7B%5C_%7Dall.jsp?arnumber=1640800

https://doi.org/10.1023/A:1008115206617

https://doi.org/10.1023/A:1008115206617

https://doi.org/10.1016/0004-3702(95)00022-4


[27] Y. Li, S. Wang, Q. Tian, and X. Ding, “A survey of recent advances in visual featuredetection”, Neurocomputing, vol. 149, pp. 736–751, 2015.

[28] C. G. Harris, M. Stephens, et al., “A combined corner and edge detector.”, in Alveyvision conference, Citeseer, vol. 15, 1988, pp. 10–5244.

[29] C. Tomasi and T. Kanade, “Detection and tracking of point features”, 1991.

[30] J. Shi and C. Tomasi, “Good features to track”, in IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition, 1994, pp. 593–600.

[31] S. M. Smith and J. M. Brady, “Susan—a new approach to low level image processing”,International journal of computer vision, vol. 23, no. 1, pp. 45–78, 1997.

[32] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection”, inEuropean conference on computer vision, Springer, 2006, pp. 430–443.

[33] E. Rosten, R. Porter, and T. Drummond, “Faster and better: A machine learningapproach to corner detection”, IEEE transactions on pattern analysis and machineintelligence, vol. 32, no. 1, pp. 105–119, 2008.

[34] E. Mair, G. D. Hager, D. Burschka, M. Suppa, and G. Hirzinger, “Adaptive andgeneric corner detection based on the accelerated segment test”, in European confer-ence on Computer vision, Springer, 2010, pp. 183–196.

[35] H. Aanæs, A. L. Dahl, and K. S. Pedersen, “Interesting interest points”, InternationalJournal of Computer Vision, vol. 97, no. 1, pp. 18–35, 2012.

[36] X. Zhang, H. Wang, A. W. Smith, X. Ling, B. C. Lovell, and D. Yang, “Corner de-tection based on gradient correlation matrices of planar curves”, Pattern recognition,vol. 43, no. 4, pp. 1207–1223, 2010.

[37] T. Lindeberg, “Feature detection with automatic scale selection”, International jour-nal of computer vision, vol. 30, no. 2, pp. 79–116, 1998.

[38] P.-L. Shui and W.-C. Zhang, “Corner detection and classification using anisotropic di-rectional derivative representations”, IEEE Transactions on Image Processing, vol. 22,no. 8, pp. 3204–3218, 2013.

[39] A. Willis and Y. Sui, “An algebraic model for fast corner detection”, in 2009 IEEE12th International Conference on Computer Vision, IEEE, 2009, pp. 2296–2302.

[40] G.-S. Xia, J. Delon, and Y. Gousseau, “Accurate junction detection and characteri-zation in natural images”, International journal of computer vision, vol. 106, no. 1,pp. 31–56, 2014.

71


[41] P. Moreels and P. Perona, “Evaluation of features detectors and descriptors basedon 3d objects”, International journal of computer vision, vol. 73, no. 3, pp. 263–284,2007.

[42] D. G. Lowe, “Distinctive image features from scale-invariant keypoints”, Internationaljournal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.

[43] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features”, inEuropean conference on computer vision, Springer, 2006, pp. 404–417.

[44] U. of British Columbia, “Method and apparatus for identifying scale invariant fea-tures in an image and use of same for locating an object in an image”, U.S. Patent6,711,293 B1, Mar. 23, 2004. [Online]. Available: https://patentimages.storage.googleapis.com/52/0d/f9/1c50d93b5ac7aa/US6711293.pdf.

[45] T. M. C. Katholieke Universiteit Leuven Eidgenössische Technische Hochschule Zürich,“Robust interest point detector and descriptor”, U.S. Patent 2009/0238460 A1, Sep. 24,2009. [Online]. Available: https://patentimages.storage.googleapis.com/65/b0/89/9702acf206abd9/US20090238460A1.pdf.

[46] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independentelementary features”, in European conference on computer vision, Springer, 2010,pp. 778–792.

[47] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternativeto sift or surf”, in 2011 International conference on computer vision, Ieee, 2011,pp. 2564–2571.

[48] S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalablekeypoints”, in 2011 International conference on computer vision, Ieee, 2011, pp. 2548–2555.

[49] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic al-gorithm configuration.”, VISAPP (1), vol. 2, no. 331-340, p. 2, 2009.

[50] ——, “Fast matching of binary features”, in 2012 Ninth conference on computer androbot vision, IEEE, 2012, pp. 404–410.

[51] R. Szeliski, Computer vision: algorithms and applications. Springer Science & Busi-ness Media, 2010.

[52] J. H. Friedman, J. L. Bentley, and R. A. Finkel, “An algorithm for finding bestmatches in logarithmic expected time”, ACM Transactions on Mathematical Software(TOMS), vol. 3, no. 3, pp. 209–226, 1977.

72

https://patentimages.storage.googleapis.com/52/0d/f9/1c50d93b5ac7aa/US6711293.pdf

https://patentimages.storage.googleapis.com/52/0d/f9/1c50d93b5ac7aa/US6711293.pdf

https://patentimages.storage.googleapis.com/65/b0/89/9702acf206abd9/US20090238460A1.pdf

https://patentimages.storage.googleapis.com/65/b0/89/9702acf206abd9/US20090238460A1.pdf


[53] H. Samet, The design and analysis of spatial data structures. Addison-Wesley Read-ing, MA, 1990, vol. 199.

[54] C. Silpa-Anan and R. Hartley, “Optimised kd-trees for fast image descriptor match-ing”, in 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE,2008, pp. 1–8.

[55] R. I. Hartley, “Estimation of relative camera positions for uncalibrated cameras”, inEuropean Conference on Computer Vision, Springer, 1992, pp. 579–587.

[56] H. C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from twoprojections”, Nature, vol. 293, no. 5828, pp. 133–135, 1981.

[57] Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry, An invitation to 3D vision: fromimages to geometric models. Springer Science & Business Media, 2012, vol. 26.

[58] D. Nistér, “An efficient solution to the five-point relative pose problem”, IEEE trans-actions on pattern analysis and machine intelligence, vol. 26, no. 6, pp. 756–770,2004.

[59] K. G. Derpanis, “Overview of the ransac algorithm”, Image Rochester NY, vol. 4,no. 1, pp. 2–3, 2010.

[60] J.-F. Huang, S.-H. Lai, and C.-M. Cheng, “Robust fundamental matrix estimationwith accurate outlier detection”, Journal of information science and engineering,vol. 23, no. 4, pp. 1213–1225, 2007.

[61] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for modelfitting with applications to image analysis and automated cartography”, Communi-cations of the ACM, vol. 24, no. 6, pp. 381–395, 1981.

[62] C. Feng and Y. Hung, “A robust method for estimating the fundamental matrix.”.

[63] H. Gao, J. Xie, Y. Hu, and Z. Yang, “Hough-ransac: A fast and robust method forrejecting mismatches”, in Chinese Conference on Pattern Recognition, Springer, 2014,pp. 363–370.

[64] M. E. Fathy, A. S. Hussein, and M. F. Tolba, “Fundamental matrix estimation: Astudy of error criteria”, Pattern Recognition Letters, vol. 32, no. 2, pp. 383–391, 2011.

[65] R. I. Hartley and P. Sturm, “Triangulation”, Computer Vision and Image Under-standing, vol. 68, no. 2, pp. 146–157, 1997.

[66] J. Mallon and P. F. Whelan, “Projective rectification from the fundamental matrix”,Image and Vision Computing, vol. 23, no. 7, pp. 643–650, 2005.

73


[67] R. A. Hamzah and H. Ibrahim, “Literature survey on stereo vision disparity mapalgorithms”, Journal of Sensors, vol. 2016, 2016.

[68] Tsukuba university computer vision dataset, [Online]. Available: https://home.cvlab.cs.tsukuba.ac.jp/dataset/ (visited on 05/31/2020).

[69] H. Hirschmuller, “Stereo processing by semiglobal matching and mutual informa-tion”, IEEE Transactions on pattern analysis and machine intelligence, vol. 30, no. 2,pp. 328–341, 2007.

[70] Opencv: Depth map from stereo images, [Online]. Available: https://docs.opencv.org/4.2.0/dd/d53/tutorial_py_depthmap.html (visited on 05/31/2020).

[71] Mozchops: Creating detailed character sheets, a case example, [Online]. Available:http://mozchops.com/articles-and-press/creating-detailed-character-sheets-case-example-2/ (visited on 05/31/2020).

[72] Documentation for opencv v.4.2.0, [Online]. Available: https://docs.opencv.org/4.2.0/ (visited on 05/31/2020).

[73] Middlebury stereo evaluation - version 3, [Online]. Available: http : / / vision .middlebury.edu/stereo/eval3/ (visited on 05/31/2020).

[74] C. B. Barber, D. P. Dobkin, and H. Huhdanpaa, “The quickhull algorithm for convexhulls”, ACM Trans. Math. Softw., vol. 22, no. 4, pp. 469–483, Dec. 1996, issn: 0098-3500. doi: 10.1145/235815.235821. [Online]. Available: http://doi.acm.org/10.1145/235815.235821.

[75] F. Hausdorff, Grundzüge der Mengenlehre. Leipzig Verlag Von Veit & Comp., 1914.

[76] The stanford 3d scanning repository, [Online]. Available: https://graphics.stanford.edu/data/3Dscanrep/ (visited on 05/31/2020).

[77] K. Sharma, K.-y. Jeong, and S.-G. Kim, “Vision based autonomous vehicle navigationwith self-organizing map feature matching technique”, in 2011 11th InternationalConference on Control, Automation and Systems, IEEE, 2011, pp. 946–949.

74

https://home.cvlab.cs.tsukuba.ac.jp/dataset/

https://home.cvlab.cs.tsukuba.ac.jp/dataset/

https://docs.opencv.org/4.2.0/dd/d53/tutorial_py_depthmap.html

https://docs.opencv.org/4.2.0/dd/d53/tutorial_py_depthmap.html

http://mozchops.com/articles-and-press/creating-detailed-character-sheets-case-example-2/

http://mozchops.com/articles-and-press/creating-detailed-character-sheets-case-example-2/

https://docs.opencv.org/4.2.0/

https://docs.opencv.org/4.2.0/

http://vision.middlebury.edu/stereo/eval3/

http://vision.middlebury.edu/stereo/eval3/

https://doi.org/10.1145/235815.235821

http://doi.acm.org/10.1145/235815.235821

http://doi.acm.org/10.1145/235815.235821

https://graphics.stanford.edu/data/3Dscanrep/

https://graphics.stanford.edu/data/3Dscanrep/

Procedural Generation of 3D Models Based on 2D Model Sheets

Documents

Transcript of Procedural Generation of 3D Models Based on 2D Model Sheets