Efficient Conservative Visibility Culling Using the Prioritized-Layered Projection Algorithm

14
Efficient Conservative Visibility Culling Using The Prioritized-Layered Projection Algorithm James T. Klosowski Cl´ audio T. Silva Abstract We propose a novel conservative visibility culling technique based on the Prioritized-Layered Projection (PLP) algorithm. PLP is a time-critical rendering technique that computes, for a given viewpoint, a partially correct image by rendering only a subset of the geometric primitives, those that PLP deter- mines to be mostly likely visible. Our new algorithm builds on PLP and provides an efficient way of finding the remain- ing visible primitives. We do this by adding a second phase to PLP which uses image-space techniques for determining the visibility status of the remaining geometry. Another contri- bution of our work is to show how to efficiently implement such image-space visibility queries using currently available OpenGL hardware and extensions. We report on the imple- mentation of our techniques on several graphics architectures, analyze their complexity, and discuss a possible hardware ex- tension that has the potential to further increase performance. CR Categories: I.3.3 [Computer Graphics]: Picture/Image Generation—Viewing Algorithms I.3.7 [Computer Graph- ics]: Three-Dimensional Graphics and Realism—Visible line/surface algorithms Keywords: Conservative visibility, occlusion culling, inter- active rendering 1 Introduction Interactive rendering of very large data sets is a fundamental problem in computer graphics. Although graphics processing power is increasing every day, its performance has not been able to keep up with the rapid increase in data set complexity. To address this shortcoming, techniques are being developed to reduce the amount of geometry that is required to be ren- dered, while still preserving image accuracy. Occlusion culling is one such technique whose goal is to determine which geometry is hidden from the viewer by other geometry. Such occluded geometry need not be processed by the graphics hardware since it will not contribute to the final image produced on the screen. Occlusion culling, also known as visibility culling 1 , is especially effective for scenes with high depth complexity, due to the large amount of oc- clusion that occurs. In such situations, much geometry can often be eliminated from the rendering process. Occlusion culling techniques are usually conservative, producing images IBM T. J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598; [email protected]. AT&T Labs-Research, 180 Park Ave., PO Box 971, Florham Park, NJ 07932; [email protected]. 1 Visibility culling is also used in a more general context to refer to all algorithms that cull geometry based on visibility, such as back-face culling, view frustum culling, and occlusion culling. (a) (b) (c) (d) Figure 1: Office model: (a) This image was computed using PLP and is missing several triangles. (b) The correct image showing all the visible triangles rendered with cPLP. (c) The current z-buffer, rendered as luminance, for the image in (a). Black/white represents near/far objects. (d) Final z-buffer for the correct image in (b). that are identical to those that would result from rendering all of the geometry. However, they can also be approximate techniques that produce images that are mostly correct, in ex- change for even greater levels of interactivity. The approxi- mate approaches are more effective when only a few pixels are rendered incorrectly, limiting any artifacts that are perceivable to the viewer. The Prioritized-Layered Projection (PLP) algorithm, intro- duced by Klosowski and Silva [16, 17], is one such example of an approximate occlusion culling technique. Rather than performing (expensive) conservative visibility determinations, PLP is an aggressive culling algorithm that estimates the vis- ible primitives for a given viewpoint, and only renders those primitives that it determines to be most likely visible, up to a user-specified budget. Consequently, PLP is suitable for gen-

Transcript of Efficient Conservative Visibility Culling Using the Prioritized-Layered Projection Algorithm

Efficient Conser vative Visibility Culling Using ThePrioritiz ed-Layered Projection Algorithm

JamesT. Klosowski�

ClaudioT. Silva†

Abstract

We proposea novel conservative visibility culling techniquebasedon thePrioritized-LayeredProjection(PLP)algorithm.PLPis a time-criticalrenderingtechniquethatcomputes,for agiven viewpoint, a partially correctimageby renderingonlya subsetof the geometricprimitives, thosethat PLP deter-minesto be mostly likely visible. Our new algorithmbuildson PLP andprovidesan efficient way of finding the remain-ing visibleprimitives.Wedo thisby addingasecondphasetoPLP which usesimage-spacetechniquesfor determiningthevisibility statusof the remaininggeometry. Another contri-bution of our work is to show how to efficiently implementsuchimage-spacevisibility queriesusingcurrentlyavailableOpenGLhardware andextensions. We reporton the imple-mentationof our techniqueson severalgraphicsarchitectures,analyzetheir complexity, anddiscussa possiblehardwareex-tensionthathasthepotentialto furtherincreaseperformance.

CR Categories: I.3.3 [ComputerGraphics]: Picture/ImageGeneration—Viewing Algorithms I.3.7 [Computer Graph-ics]: Three-DimensionalGraphics and Realism—Visibleline/surfacealgorithms

Keywords: Conservative visibility, occlusionculling, inter-active rendering

1 Intr oduction

Interactive renderingof very large datasetsis a fundamentalproblemin computergraphics.Althoughgraphicsprocessingpower is increasingevery day, its performancehasnot beenableto keepup with therapid increasein datasetcomplexity.To addressthis shortcoming,techniquesarebeingdevelopedto reducethe amountof geometrythat is requiredto be ren-dered,while still preservingimageaccuracy.

Occlusionculling is onesuchtechniquewhosegoal is todeterminewhich geometryis hiddenfrom theviewer by othergeometry. Such occludedgeometryneednot be processedby the graphicshardware since it will not contribute to thefinal imageproducedon the screen.Occlusionculling, alsoknown asvisibility culling1, is especiallyeffective for sceneswith high depthcomplexity, due to the large amountof oc-clusion that occurs. In suchsituations,much geometrycanoften be eliminatedfrom the renderingprocess. Occlusionculling techniquesareusuallyconservative,producingimages

�IBM T. J. Watson ResearchCenter, PO Box 704, Yorktown

Heights,NY 10598;[email protected].†AT&T Labs-Research,180 Park Ave., PO Box 971, Florham

Park,NJ07932;[email protected] culling is alsousedin amoregeneralcontext to referto

all algorithmsthatcull geometrybasedonvisibility, suchasback-faceculling, view frustumculling, andocclusionculling.

(a) (b)

(c) (d)

Figure1: Office model: (a) This imagewascomputedusingPLP and is missingseveral triangles. (b) The correctimageshowing all thevisible trianglesrenderedwith cPLP. (c) Thecurrentz-buffer, renderedasluminance,for the imagein (a).Black/whiterepresentsnear/far objects.(d) Final z-buffer forthecorrectimagein (b).

that are identical to thosethat would result from renderingall of the geometry. However, they canalsobe approximatetechniquesthatproduceimagesthataremostlycorrect,in ex-changefor even greaterlevels of interactivity. The approxi-mateapproachesaremoreeffectivewhenonly afew pixelsarerenderedincorrectly, limiting any artifactsthatareperceivableto theviewer.

ThePrioritized-LayeredProjection(PLP)algorithm,intro-ducedby Klosowski andSilva [16, 17], is onesuchexampleof an approximateocclusionculling technique. Ratherthanperforming(expensive)conservative visibility determinations,PLPis anaggressive culling algorithmthatestimatesthevis-ible primitivesfor a given viewpoint, andonly rendersthoseprimitivesthat it determinesto bemostlikely visible, up to auser-specifiedbudget.Consequently, PLP is suitablefor gen-

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 2

eratingpartially correctimagesfor usein a time-critical ren-deringsystem.To illustratethisapproach,considertheimagesof theoffice modelshown in Fig. 1. The imagegeneratedbyPLP for this viewpoint is shown in Fig. 1(a), while the cor-rectly renderedimageis in Fig. 1(b). We canseethat the im-agerenderedby PLP is fairly accurate,althoughportionsofthe modelaremissing,including the plant stand,clock, doorjam,andpartsof thedesklamp.

PLPworksby initially creatinga partitionof thespaceoc-cupiedby thegeometricprimitives.Eachcell in thepartitionisthenassigned,duringtherenderingloop,a probabilisticvalueindicating how likely it is that the cell is visible, given thecurrentviewpoint, view direction,andgeometryin theneigh-boringcells. The intuitive ideabehindthealgorithmis thatacell containingmuchgeometryis likely to occludethe cellsbehindit. At eachpoint of the algorithm, PLP maintainsapriority queue,alsocalledthe front, which determineswhichcell is most likely to be visible and thereforeprojectednextby the algorithm. As cells areprojected,the geometryasso-ciatedwith thosecells is rendered,until the algorithm runsout of time or reachesits limit of renderedprimitives. At thesametime, the neighboringcells of the renderedcell are in-sertedinto thefront with appropriateprobabilisticvalues.It isby schedulingtheprojectionof cellsasthey areinsertedin thefront thatPLPis ableto performeffectivevisibility estimation.

In [16, 17], PLPwasshown to beeffective at finding visi-ble primitivesfor reasonablysmallbudgets.For example,fora city modelcontaining500K triangles,PLPwasableto find(onaverage)90%of thevisible triangleswhile renderingonly10%of the total geometry. This numberalonedoesnot guar-anteethe quality of the resulting images,since the missing10% of the visible trianglescould occupy a very large per-centageof the screenor may be centrally locatedso that theincorrectpixelsarevery evidentto theviewer. To addressthisconcern,the authorsreportedthe numberof incorrectpixelsgeneratedby the PLP algorithm. In the worst case,for thesamemodelandviewpointsdiscussedabove,PLPonly gener-ated4% of thepixels incorrectly. Thesetwo statisticssupporttheclaim thatPLPis effective in finding visiblegeometry.

As mentionedpreviously, approximateocclusioncullingtechniqueswill sacrificeimageaccuracy for greaterrenderinginteractivity. While this tradeoff may be acceptablein someapplications(especiallythosethat demandtime-critical ren-dering),therearemany others(suchasmanufacturing,medi-cal,andscientificvisualizationapplications)thatcannottoler-atesuchartifacts.Theusersof theseapplicationsrequirethatall of the imagesgeneratedto becompletelyaccurate.To ad-dressthe requirementsof theseapplications,we describeanefficient conservative occlusionculling algorithmbaseduponPLP. Essentially, our new algorithm works by filling in theholesin the imagewherePLP madethe mistake of not ren-deringthecompletesetof visiblegeometry.

An interestingfact is thatafter renderingPLP’s estimationof the visible set,asshown in Fig. 1(a),mostof the z-buffergets initialized to somenon-default value, as illustrated byFig. 1(c). This figurecorrespondsto thez-buffer renderedasluminance,whereblackrepresentsnearobjects,andwhiterep-resentsfar objects.If we wereto renderthecells in the front(seeFig. 3), thevisible cellswould protrudethroughtheren-deredgeometry. The techniqueswe presentin this paperarebasedon incrementallycomputingwhich cells in PLP’s frontare occluded(that is, can not be “seen” throughthe currentz-buffer), andeliminatingthemfrom the front until the frontis empty. When this conditionholds,we know we have the

Figure 2: Illustration of the accuracy of PLP: For the sameviewpoint andmodelasshown in Fig. 1, thevisible geometrythatPLPrenderedis shown in white,andthevisiblegeometrythatPLPdid not renderis shown in red.

correctimage(1(b))andz-buffer (1(d)).The useof (two-dimensional)depthinformation to avoid

renderingoccludedgeometryis not a new idea. The Hierar-chicalZ-Buffer techniqueof Greeneetal. [14] is probablythebestknown exampleof a techniquethateffectively usessuchinformation.However, evenbeforethisseminalpaper, KubotaPacific alreadyhad hardware supporton their graphicssub-systemfor visibility queriesbasedon thecurrentstatusof thedepthbuffer. In Section5, wewill putournew techniquesintocontext with respectto therelevantrelatedwork.

Themaincontributionsof ourwork are:

� WeproposecPLP, anefficient interactiverenderingalgo-rithm thatworksasanextensionto thePLPalgorithmbyaddingasecondphasewhichusesimage-spacevisibilityqueries.

� Weshow how to efficiently implementsuchimage-spacevisibility queriesusingavailableOpenGLhardwareandextensions. Our implementationtechniquescanpoten-tially beusedin conjunctionwith otheralgorithms.

� We discussthe performanceand limitations of currentgraphicshardware, andwe proposea simple hardwareextension that could provide further performanceim-provements.

Theremainderof our paperhasbeenorganizedasfollows.In Section2, after a brief overview of PLPandsomeaspectsof its implementation,we detail our new cPLP algorithm.We presentseveral techniquesfor the implementationof ourimage-spacevisibility queriesusingavailableOpenGLhard-wareandextensionsin Section3. We alsoproposea simplehardwareextensionto furtherimproverenderingperformance.In Section4 we reporton theoverall performanceof thevari-oustechniquesonseveralgraphicsarchitectures.In Section5,weprovideabrief overview of thepreviouswork onocclusionculling, followedby a morethoroughcomparisonof our cur-rentalgorithmwith themostrelevantprior techniques.Finally,weendthepresentationwith someconcludingremarks.

2

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 3

2 The Conser vative PLP Algorithm

The conservative PLP algorithm (cPLP) is an extensiontoPLP which efficiently usesimage-spacevisibility queriestodevelop a conservative occlusionculling algorithmon top ofPLP’s time-critical framework. In this section,we briefly re-view the original PLP algorithm and then presentour cPLPalgorithm. Our image-spacevisibility queries,a crucial partof theimplementationof cPLP, arediscussedin Section3.

2.1 Overview of PLP

Prioritized-LayeredProjectionis a techniquefor fast render-ing of high depthcomplexity scenes.It works by estimatingthevisible primitivesin a scenefrom a givenviewpoint incre-mentally. At theheartof thePLPalgorithmis aspace-traversalalgorithm, which prioritizes the projectionof the geometricprimitivesin sucha way asto delayrenderingprimitivesthathave a small likelihood of being visible. Insteadof explic-itly overestimatingthe visible setof primitives,asis doneinconservative techniques,thealgorithmworksonabudget.Foreachviewpoint, theviewercanprovideamaximumnumberofprimitivesto be renderedandthealgorithmwill deliver whatit considersto be the setof primitiveswhich maximizestheimagequality, basedupona visibility estimationmetric. PLPconsistsof anefficient preprocessingstepfollowedby a time-critical renderingalgorithmasthedatais beingvisualized.

PLPpartitionsthespacethatcontainstheoriginal inputge-ometry into convex cells. During this one-timepreprocess-ing, the collection of cells is generatedin sucha way as toroughly keepa uniform densityof primitives per cell. Thissamplingleadsto large cells in unpopulatedareasandsmallcells in denselyoccupiedareas.Originally, the spatialparti-tioning usedwasa DelaunayTriangulation[16]; however, anoctreehasrecentlybeenshown in [17] to be a more effec-tive datastructure,bothin termsof efficiency andeaseof use.Sinceanoctreeis actuallya hierarchyof spatialnodesasop-posedto a disjoint partition,we only utilize thesetof all leafnodesof theoctree,sincethesedo providesucha partition.

Using the numberof geometricprimitives containedin agivencell,asolidityvalueρ isdefined,whichrepresentsthein-trinsic occlusionthat this cell will generate.During thespacetraversalalgorithm,solidity valuesare accumulatedby cellsbasedupon the current viewing parameters(viewpoint andview direction),aswell as the normalof the facesharedbyneighboringcells.Usingtheseaccumulatedvalues,thetraver-salalgorithmprioritizeswhich cellsaremostlikely to bevis-ible andthereforeshouldbe projected.For a completetreat-mentof thesecalculations,pleasereferto [16, 17].

Startingfrom the initial cell which containstheviewpoint,PLPattemptsto carve cellsout of thetessellation.It doesthisby alwaysprojectingthe cell in the front

�(the front is the

collection of cells that are immediatecandidatesfor projec-tion) thatis leastlikely to beoccludedaccordingto its solidityvalue. Initially, the front is empty, andascells are inserted,their accumulatedsolidity valuesare estimatedto reflect itspositionduring the traversal.Every time a cell in the front isprojected,all of thegeometryassignedto it is rendered.

2.2 The cPLP Algorithm

As previously mentioned,the cPLPalgorithmis built on topof PLP. The basic idea is to first run PLP to renderan ini-tial, approximateimage. As a side effect of renderingthis

image, two further structureswill be generatedthat we canexploit in cPLP: � i � thedepthbuffer correspondingto theap-proximateimage,and � ii � PLP’s priority queue(front), whichcorrespondsto the cells of thespatialpartition thatwould berenderednext by PLP if it hadmoretime. In cPLP, we williteratively usethe depthbuffer to effectively cull the cells inthe front until all of the visible geometryhasbeenrendered.Thegeneralideacanbesummarizedasfollows:

(1) RunPLPusinga smallbudgetof geometricprimitives.

Thisstepgeneratesapartiallycorrectimagewith “holes”(regions of incorrect pixels), the correspondingdepthbuffer, andthepriority queue(front) containingthecellsthatwould beprojectednext.

(2) While the front is not empty, perform the followingsteps:

(2a) Given the current front, determinewhich cellsareoccluded,usingimage-spacevisibility queries,andremove themfrom thefront.

(2b) ContinuerunningPLP, sothateachcell in thecur-rent front getsprojected,sincewe know that theyarevisible.During thisphase,new cellsthatneighborthepro-jectedcells are insertedinto the front as before.However, weterminatethisiterationonceall of theoriginal cellsin thefront have beenprojected.

As cells are renderedin step (2b), the holes (and thedepthbuffer) get filled in, until the imageis complete.A nicefeatureof cPLPis thatwe know we aredoneex-actlywhenthefront is empty.

Oneadvantageof theformulationgivenabove is thatcPLPis ableto performseveral visibility queriesduringeachitera-tion. At thesametime, themaincomplicationin implement-ing cPLPcomesfrom thevisibility queriesin step(2a). Thisis furtherdiscussedin Section3.

2.3 Challeng es

Thereareprimarily threeobstaclesthatcPLPmustovercometo bea conservative, interactive renderingalgorithm. It muststartwith a goodestimationof the correctimage,determinewhich regionsof theestimationareincorrect,andfind there-mainingvisible geometry. Of course,to be truly interactive,eachof the solutionsto thesechallengesmustbe performedvery efficiently. This canbedonethanksto theway PLPwasdesigned.We discusseachof theseissuesbelow.

Estimating the image As demonstratedin [16, 17], PLPis very effective in finding thevisible polygonsandcorrectlyrenderingthe vastmajority of pixels, even whenusing rela-tivelysmallbudgets.To illustratethispoint,Figs.1(a)and1(b)show imagesof an office modelfor the PLP andcPLPalgo-rithms.PLPwasfairly successfulin findingmostof thevisiblegeometryfor this viewpoint. To bettervisualizetheaccuracyof PLP, Fig. 2 highlights the visible geometrythat PLP ren-deredin white,andthevisiblegeometrythatPLPdid not ren-der in red. By taking full advantageof the accuracy of PLP,our conservative algorithmcanquickly obtaina goodestima-tion of thecorrectimage.

3

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 4

Figure 3: The currentthe front is highlightedin green. Bydeterminingwherethe front is still visible, it is possibletolocalizetheremainingwork doneby our renderingalgorithm.

This featurecanalsobeusedto potentiallyspeed-upotherocclusionculling techniques(suchasthosein [25, 33]), whichrely onusingthez-buffer valuesto cull geometry.

Finding the holes As PLPprojectscells(andrendersthegeometryinside thesecells), it maintainsthe collection ofcells that are immediatecandidatesfor projectionin a prior-ity queue,called the front. Clearly, as the primitives in thescenearerendered,partsof thefront getoccludedby theren-deredgeometry. In Fig. 3, we illustratethis exacteffect. If no“green” (thecolor thatweusedfor thefront) werepresent,theimagewould be correct. In general,the imagewill be com-pleted,andrenderingcanbe stopped,after all of the cells inthe front are occludedby the renderedprimitives. Thus, tofind theholesin the estimatedimage,we needonly considerthecellsin thefront.

Filling the holes The final piece that we needto buildcPLP is how to completethe renderingoncewe know whatpartsof the front arestill visible. For this, it is easierto firstconsiderthecurrentoccludedpartof the front. Basically, wecanthink of theoccludedfront asasingleoccluder(seeFig.4)that hasa few holes(correspondingto the greenpatchesinFig. 3). Thinking analogouslyto the work of Luebke andGeorges[19], theholescanbe thoughtof as“portals”, or re-ducedviewing frusta,throughwhichall of theremainingvisi-ble geometrycanbeseen.An equivalentformulationis to in-crementallydeterminewhatcellsbelongto thesesmallerviewfrustaby usinganefficient visibility query(discussedbelow).

3 Implementing Visibility Queries

As previously discussed,to extendPLPinto a conservative al-gorithm, we needto efficiently determinewhich cells in thefront are visible. The visibility querieswill take place inimage-spaceandwill utilize the currentdepthbuffer. In thissection,we first describethreetechniquesfor implementingthesequeriesusing available OpenGL hardware and exten-sions. Theseinclude using a hardware featureavailable on

Eye

Front

Visible

Visible

Occluded

Figure4: This figure illustratesthe techniqueusedin findingthe remainingvisible cells in cPLP. Thesecells arefoundbylimiting theremainingwork doneby thealgorithmto only thevisible regions.

somegraphicsarchitectures(suchas someHewlett-Packard(HP) andSilicon Graphics(SGI) graphicsadapters),an item-buffer techniquethat requiresonly the capability of readingbackthecolor buffer, andanalternative approachthatusesanextensionof OpenGL1.2. Then,we discusssomefurtherop-timizationtechniques.Finally, we endthis sectionby propos-ing a new hardwareextensionthat hasthe potentialto speedupvisibility queriesevenfurther.

3.1 Counting Fragments After Depth Test

Onetechniquefor performingthe visibility queriesof cPLPis to usetheHP occlusionculling extension,which is imple-mentedin their fx seriesof graphicsaccelerators.This propri-etaryfeature,which actuallyseemsquitesimilar to thecapa-bilities of the KubotaPacific Titan 3000reportedby Greeneet al. [14], makesit possibleto determinethevisibility of ob-jectsascomparedto the currentvaluesin the z-buffer. Theideais to adda feedbackloop in the hardwarewhich is ableto check if changeswould have beenmadeto the z-bufferwhenscan-converting geometricprimitives. Theactualhard-warefeatureasimplementedon the fx seriesgraphicsaccel-eratorsis explainedin further detail in [25, 26]. Thoughnotwell-known, severalothervendorsprovide thesamefunction-ality. Basically, by simply addinginstrumentationcapabilitiesto thehardwarewhich areableto countthe fragmentswhichpassthe depth test, any architecturecan be efficiently aug-mentedwith suchocclusionculling capabilities. This is thecasefor theSGIVisualWorkstationserieswhichhavedefinedan extensioncalledGL SGIX depth pass instrument[27, pages72–75]. Several new graphicsboards,suchastheSGIInfiniteReality3 andtheDiamondFireGLhavesuchfunc-tionality. Even low-costPC cardssuchas the 3Dfx Voodoographicsboardshave hadsimilar functionality in their Glidelibrary (basicallyby supportingqueriesinto thehardwarereg-isters).Sincethe functionalityproposedby thedifferentven-dorsis similar, in therestof this paper, we concentrateon theHP implementationof suchocclusionculling tests.

4

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 5

Onepossibleuseof thishardwarefeatureis to avoid render-ing a very complex objectby first checkingif it is potentiallyvisible. Thiscanbedoneby checkingwhetheraboundingvol-umebv, usuallytheboundingbox of theobject,is visible andonly renderingthe actualobject if bv is visible. This canbedoneusingthefollowing fragmentof C++ code:

glEnable(GL_OCCLUSION_TEST_HP);glDepthMask(GL_FALSE);glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);DrawBoundingBoxOfObject();bool isVisible;glGetBooleanv(GL_OCCLUSION_RESULT_HP, &isVisible);glDisable(GL_OCCLUSION_TEST_HP);glDepthMask(GL_TRUE);glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);if (isVisible)DrawGeometryofObject();

Thiscapabilityis exactlywhatis requiredby ourcPLPvisi-bility queries.Giventhecurrentz-buffer, weneedto determinewhatcellsin thefront arevisible. It is a simpletaskto usetheHPhardwareto querythevisibility statusof eachcell.

TheHPocclusionculling featureis implementedin severalof theirgraphicsaccelerators,for example,thefx6 boards.Al-thoughperformingour visibility queriesusing the HP hard-wareis very easy, theHP occlusionculling testis not cheap.In anHP white paper[26], it is estimatedthatperforminganocclusionquerywith aboundingboxof anobjecton thefx6 isequivalentto renderingabout19025-pixel triangles.Our ownexperimentson an HP Kayak with an fx6 estimatesthe costof eachquerybeinghigher. Dependinguponthe sizeof theboundingbox,it couldrequireanywherebetween0.1millisec-onds(ms)to 1 ms.This indicatesthata naive approachto vis-ibility culling, whereobjectsareconstantlycheckedfor beingoccluded,might actually hurt performance,and not achievethe full potentialof thegraphicsboard. In fact, it is possibleto slow down the fx6 considerablyif oneis unlucky enoughto project the polygonsin a back-to-frontorder, sincenoneof the boundingboxeswould be occluded. In their mostre-centofferings,HP hasimproved their occlusionculling fea-tures. The fx5 andfx10 acceleratorscanperformseveral oc-clusionculling queriesin parallel [9]. Also, HP reportsthattheirOpenGLimplementationhasbeenchangedto usetheoc-clusionculling featuresautomaticallywhenever feasible.Forexample,prior to renderinga largedisplaylist, their softwarewould actuallyperforman occlusionquerybeforerenderingall of thegeometry.

Utilizing theHP occlusionculling featurehasprovento bethesimplestandmostefficientof our threetechniquesfor per-forming thevisibility queriesneededby cPLP. Unfortunately,at this time, this hardware featureis not widely available inothergraphicsboards(for instance,neitherof market leadersNvidia or ATI supportthis feature).Becauseof this, we nextdescribea simpleitem-buffer technique,whoseonly require-ment is the capability to readback the color buffer. In Sec-tion 3.6,we proposea simpleextensionof theOpenGLfunc-tionality whichextendsthefragment-countingidea,by addingcomponentsof thetechniquesdescribednext.

3.2 An Item Buff er Technique

It is possibleto implementvisibility queriessimilar to theonesprovided by the HP occlusionteston genericOpenGLhard-ware.Thebasicideais to usethecolorbuffer to determinethe

visibility of geometricprimitives. For example,if onewouldliketo determineif agivenprimitive is visible,onecouldclearthecolorbuffer, disablechangesto thez-buffer (but not theac-tualz test),andthenrenderthe(boundingboxof the)primitivewith a well-known color. If that color appearsduring a scanof the color buffer, we know that someportion of the primi-tive passedthez test,which meansthe(boundingbox of the)primitive is actuallyvisible.

Thereare two main costsassociatedwith the item-buffertechnique: transferringthe color buffer from the graphicsadapterto the hostcomputer’s main memoryandthe time ittakestheCPUto scanthecolor buffer. The transfercostcanbesubstantialin comparisonto thescanningcost(seeTable2).Consequently, it is muchmoreefficient to do many visibilityqueriesat once.By coloringeachof thecellsin thefront withdifferentcolors,it is possibleto performmany queriesat thesametime.

An unwantedsideeffect of checkingmultiple cells is thata cell, C, in the front may be occludedby othercells in thefront, asopposedto thecurrentz-buffer which containsdepthinformation for the previously rendered geometry. This is aproblembecausealthoughcellC is occludedby theothercellsin thefront, thegeometrycontainedwithin cell C maynot beoccludedby thegeometrywithin theothercells.A multi-passalgorithmis thereforerequiredto guaranteethatacell is prop-erly marked as occluded. Initially, all cells in the front aremarkedas“potentiallyvisible”. Wealsodisablewriting to thez-buffer, sothat it remainsaccuratewith respectto thegeom-etry previously renderedby PLP. Eachpassof the algorithmthenclearsthecolorbuffer andrenderseachof thecellsin thefront that is potentiallyvisible usinga distinctcolor. We thentransferandscanthecolorbuffer to determinewhich cellsareactuallyvisible andmark them. Iterating in this fashion,wecandetermineexactly which cells arevisible with respecttothepreviously renderedgeometry. Theremainingcellsarede-terminedto beoccludedby thepreviously renderedgeometryandneednot beconsideredfurther. Themulti-passalgorithmterminatesoncethe color buffer scanindicatesthat noneoftherenderedcells,for thecurrentpass,weredeterminedto bevisible. Thatis, thecolorbuffer is completelyemptyof all col-ors.Notethatpotentiallyvisiblecellswill needto berenderedmultiple times. Eachtime a cell is found to bevisible in onepass,it is markedandnot renderedagain.Pseudo-codefor theitem-buffer techniqueis includedbelow.

glDepthMask(GL_FALSE);for each cell c in front

markCellPotentiallyVisible(c);bool done = false;while (!done)

glClear(GL_COLOR_BUFFER_BIT);for each cell c in front

if (potentiallyVisible(c))renderCell(c);

glReadPixels(0, 0, width, height,GL_RGBA, GL_UNSIGNED_BYTE, visible_colors);

int cnt = 0;for each cell c that appears in visible_colors

markCellVisible(c);cnt++;

if (cnt == 0)done = true;

5

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 6

3.3 The OpenGL Histogram Extension

Theitem-buffer techniquejustproposedperformsa lot of datamovementbetweenthegraphicsaccelerator’smemoryandthehost computer’s main memory. On most architectures,thisis still a very expensive operation,sincethe datamust flowthroughsomesharedbuswith all of theothercomponentsinthecomputer. Weproposeadifferenttechniquewhichusesin-trinsicOpenGLoperationsto performall thecomputationsonthegraphicsaccelerators,andonly move a very smallamountof databackto thehostCPU.

Our new techniquesharessomesimilarity to the previousitem-buffer technique. For instance,it also needsto renderthe potentially visible cells multiple times, until no visiblecell is found. However, the new methodusesOpenGL’s his-togrammingfacility, availablein theARB imagingextensionof OpenGL1.2,to actuallycomputethevisiblecells(see[1]).After renderingthepotentiallyvisiblecellsin this case,ratherthantransferringthecolor buffer to thehost’s CPUandscan-ning it for the visible cells, we simply enablethe histogram-ming facility and transferthe color buffer into texture mem-ory (still on the graphicsaccelerator). During this transfer,OpenGLwill computethenumberof timesa particularcolorappears.A shortarraywith theaccumulatedvaluescanthenbefetchedby thehostCPUwith a singlecall. A fragmentofourC++ codeillustratesthis approach.

glEnable(GL_TEXTURE_2D);glEnable(GL_HISTOGRAM_EXT);glHistogramEXT(GL_HISTOGRAM_EXT, 256,

GL_LUMINANCE, GL_TRUE );glCopyTexSubImage2D(GL_TEXTURE_2D, 0,

0, 0, WIDTH, HEIGHT, WIDTH, HEIGHT);GLuint histogram_values[256];glGetHistogramEXT(GL_HISTOGRAM_EXT, GL_FALSE,

GL_LUMINANCE, GL_UNSIGNED_INT,histogram_values);

glResetHistogramEXT ( GL_HISTOGRAM_EXT );glDisable(GL_TEXTURE_2D);glDisable(GL_HISTOGRAM_EXT);

After thiscodeis executed,thearrayhistogramvaluescon-tainsthenumberof timeseachcolor (herecountedfrom 0 to255) appeared.With this technique,the graphicsboarddoesall the work, andonly transfersthe resultsto the hostCPU.Thesameterminationcriterionexistsfor this multi-passalgo-rithm asfor the item-buffer technique,althoughwe canmoreeasilytestfor this conditionin this case.For instance,if his-togramvalues[0]is equalto WIDTH � HEIGHT, meaningallpixelsarethesame(background)color, thenno cellsarevisi-bleandweterminatethealgorithm.

3.4 Impr oving Visibility Query Performance

It is possibleto improve the performanceof our visibilityquerytechniquesby implementingseveraloptimizations.Theprevioustwo techniquesneedto performoperationsthattouchall thepixels in the image,possiblymultiple times. To avoidcomputationsin areasof the screenthat have alreadybeencompletely covered, we have implementeda simple tilingschemethatgreatlyreducestheamountof transfersandscansrequired. The basicidea is to simply divide the screenintofixed tiles. For a 512x512pixel image,we could breakthescreenup into 64 tiles, eachcontaininga block of 64x64pix-els. During the multi-passalgorithm,we needto keeptrack

of the active tiles, thosethat in the previous iteration con-tainedvisible primitives. After eachiteration, tiles get com-pleted,andthe numberof tiles which needto be renderedtoandscanneddecreases.

Anothersimpleoptimizationfor the item-buffer techniquewasto minimizethenumberof colorchannelsto transferto thehostcomputer’s mainmemory. For example,if we have r bitsto representtheredcolor componenton our machine,andwefewerthan2r cellsto checkin thefront, wecanuniquelycolorthesecellsusingonly theredcolorcomponent.Consequently,we would only needto transferandscanthe GL RED com-ponentfor eachpixel in theimage,asopposedto transferringandscanningtheentireGL RGBA component.

We have implementedand are currently using thesetwooptimizations.A non-conservative optimizationfor our tech-niqueswould be to computevisibility in a lower resolutionthanthe actualrenderingwindow [33]. Although a quite ef-fective optimization,this might leadto undesirableartifacts.This is oneof thereasonswe donotuseit in oursystem.

3.5 Integration with cPLP

The techniquespresentedso far essentiallysolve step(2a)ofcPLP. Boththeitem-buffer techniqueaswell asthehistogram-ming techniqueneedto have accessto thecolor buffer of themachinebeingusedfor its computations.For eachpass,theyrequirethat the color buffer be cleared,which conflictswiththe imagecomputationwhich is performedin steps(1) and(2b). Naively, it wouldbenecessaryto savethecompletecolorbuffer (or at leasttheactive tiles) beforeeachcall to step(2a)andrestoreit beforethecall to step(2b).

Instead,sincewe expectthatafterstep(1) mostof thevis-ible triangleshave beenrendered,we simply save the imagestep(1) generated,andignorethechangesto thecolor bufferfrom thenon (we re-renderedthe extra geometryin the endto recover the correctimage). The importantthing is to cor-rectly accountfor the z-buffer changesthat are triggeredbytherenderingof thegeometryinsidethecells. To do this, be-fore step(2b), we changethemaskson thez-buffer so that itgetsupdatedasgeometryis renderedin (2b). Whenthe frontbecomesempty, we know thez-buffer wascompleted.At thatpoint, we performa singleimagerestore(with the imagewesavedin step(1)), andwe re-renderall thegeometrythatwasfoundto bevisiblesincethatpoint.

Fig. 10 providesanoverview of our cPLPalgorithmasde-scribed. For a sampleview of an office model, snapshotswere taken at several iterations (step 2) of our algorithm.Figs.10(a)-(c)illustratethecurrentcolor buffer andfront (inblue) at eachiteration. The remainingvisible geometrywillcomefrom within thevisible front cells. (d)-(f) illustratethetiles of thescreenthathave beencompletedandthereforedonot needto be scannedduring subsequentiterations. (c) and(f) correspondto thefinal (correct)image,sinceall of thetileshave beencompletedcovered.Notethatin (b), thefront cells,which arebarelyvisible, arein theupperleft cornerandnearthe two desksin the middle of the screen.As expected,thetiles thatrepresenttheseareasarenot markedascompleted.

3.6 Extending the OpenGL Histogram

Herewe proposea modificationto OpenGLthat hasthe po-tentialto greatlyimprove performance.In particular, it wouldmake it possibleto avoid thecostlymulti-passvisibility com-

6

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 7

Host Memory Pixel Storage Ops RasterizationPixel Transfer Ops Per-fragment Ops Framebuffer

Texture Memory

Figure5: OpenGLimagingpipeline

putationsthatwearecurrentlyforcedto use,andit canbeseenasa generalizationof theHPocclusionculling test.

OpenGL backgr ound Beforewe go into details,it helpsto understanda bit moreon how OpenGLworks. Thegraph-ics pipelineis thetermusedfor thepatha particularprimitivetakesin thegraphicshardwarefrom thetimetheuserdefinesitin 3D to thetimeit actuallycontributesto thecolorof apartic-ular pixel on thescreen.At a veryhigh level, aprimitive mustundergo severaloperationsbeforeit is drawn on thescreen.Aflowchartof theoperationsis shown in Fig. 5.

Theuserhasseveraloptionsfor specifyingverticesthataregroupedinto primitives,e.g.,trianglesor quads.Primitivesgothroughseveral stages(not shown), andeventually, get to therasterizationphase. It is at rasterizationthat the colors andother propertiesof eachpixel are computed. During raster-ization, primitives get broken into what we usually refer toas “fragments”. Moderngraphicsarchitectureshave severalper-fragmentoperationsthat canbe performedon eachfrag-mentasthey aregenerated.As fragmentsarecomputed,theyarefurtherprocessed,andthehardwareincrementallyfills theframebuffer with animage.

Per-Fragment Histogramming TheOpenGLhistogram-ming facility, part of the pixel transferoperationsshown inFig. 5, operateson images,which canpotentiallycomefromthe framebuffer. The OpenGLhistogramworks by countingthenumberof timesa color appearsin a givenimage.

The reasonwe needto perform multiple passesto deter-mine whencells arevisible at this time is that we areusingthe color buffer to find which of the primitivespassedthe z-test. With the standardpipeline,we only get the “top layer”of visible cells, sinceoneof theper-fragmentoperationsthatoccursbeforeapixel is written to thecolorbuffer is thedepth-test. If a per-fragmenthistogrammingfacility is addedto thepipelineandit couldbeusedto performthesameexactopera-tion on fragments(whichpassthez-test),it wouldbepossibleto counthow many fragmentsof a givenprimitive passedthez-test.If thisnumberis zero,theprimitivewouldbeoccluded,otherwise,the histogramvaluewould not only tells us that itis visible, but actually provide an upperboundon the num-berof its pixels thatarevisible. With theproposedchangeinthe OpenGLpipeline,we would still be ableto performsev-eralqueriesat thesametime,but we would not berequiredtoperformmultiple passesover theframebuffer.

The per-fragment histogramming functionality we areproposingis a cleanway to extendthe (alreadyuseful)tech-niquesbasedoncountingthenumberof fragmentswhichpassthez-test(suchastheHP occlusionculling test),so that it isableto handlemultipleandmoregeneraltestswith betterper-formance.We would like to point out that the hardwarecost(in componentcostor chip area)would likely be non-trivial,sincehigh-performancegraphicshardware is highly parallel

(for instance,Nvidia’s GeForcecancomputefour fragmentssimultaneously),andtheextra hardwarefor theper-fragmenthistogrammingwould have to bereplicatedfor eachfragmentgenerator. Of course,this is alreadythecasefor severalotherextensions,including the existing fragment counting hard-ware. We believe the actualcost(in time) of our augmentedtestwould besimilar to thecostof a singleHP test,while wewouldbeableto performseveraltestsconcurrently.

4 Experimental Results

We performeda seriesof experimentsto determinetheeffec-tivenessof ournew cPLPalgorithm.Wereportresultsfor eachof thethreeimplementationsof ourvisibility queriespresentedin Section3, aswell asseveralalternativesfor benchmarking:

cPLP-HP: cPLP, usingtheHPocclusionculling extension,

cPLP-IB: cPLP, usingtheitem-buffer technique,

cPLP-HG: cPLP, usingtheOpenGLhistogramextension,

cPLP-EXT: cPLP, usingourhardwareextensionproposedinSection 3.6,

PLP: theoriginalPLP,

VF-BF: view frustumandback-faceculling only,

HP: usingtheHP hardwareto performthe visibility querieswithout the benefitof runningPLP to preloadthecoloranddepthbuffers.

Test model Theprimarymodelthatwe reportresultson isshown in Fig. 9(a)andconsistsof threecopies,placedsidebyside,of the third floor of the Berkeley SODA Hall. Arrang-ing the copiesin sucha way helpsus betterunderstandhowthe different occlusionculling techniquesfunction in a highdepthcomplexity environment,sincethey have their greatestopportunitywherethenis significantocclusion.Eachroominthemodelhasvariouspiecesof furnitureandin total,thethreereplicascontainover onemillion triangles.

We generateda 500-framepath that travels right-to-left,startingfrom theupperright cornerof Fig. 9(a). In Fig. 9(b)–(e), we show a few representative framesof the path. Thenumberof visiblepolygonsin eachframevariesconsiderably,especiallywhenmoving from roomto room.

Machine architectures Ourexperimentswereperformedon a three different architectures:an SGI Octane,an SGIOnyx, andanHP Kayak. Theconfigurationsof themachinesarelistedin Table1.

7

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 8

Machine CPU(s) Graphics RAM

SGIOctane 1 X R12000,300MHz MXE 512MBSGIOnyx 12 X R10000,195MHz Infinite Reality 2GBHP Kayak 2 X PentiumII, 450MHz fx6 384MB

Table1: The configurationsof the machinesusedin our ex-periments.Thenumberof processorsP permachineis listedin theCPU(s)column,in theform: PX cpu-type,cpu-speed.

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 20 40 60 80 100

Ave

rage

Ren

derin

g T

ime

(sec

onds

)

PLP Triangle Budget (thousands)

Octane cPLP-IBOctane cPLP-HG

Onxy cPLP-IBOnxy cPLP-HGKayak cPLP-HP

Figure6: Averagerenderingtimes per frame for the imple-mentationsof thecPLPalgorithm. ThePLPbudget,reportedin thousandsof triangles,determinesthe numberof trianglesinitially renderedto fill-in thedepthbuffer.

Prepr ocessing As discussedin Section2, thepreprocess-ing stepof cPLP, which is identicalto thepreprocessingstepof the original PLP algorithm, is very efficient. The prepro-cessingincludesreadingtheinputgeometryfrom afile, build-ing theoctree,determiningwhichgeometryeachcell contains,andcomputingtheinitial solidity values.Thetotalpreprocess-ing timesfor theonemillion trianglemodelmentionedabovewas76seconds,128seconds,and90 seconds,for theOctane,Onyx, andKayak,respectively. While thesetimesareactuallyquitemodest,we have anadditionalopportunityto reducethepreprocessingrequirement.For portability purposes,we arecurrentlyusinganASCII formatto storethemodel.For eachof thethreemachinesbeingused,at leasthalf (42,64,and56seconds,respectively) of the preprocessingtime listed abovewasspentsimply readingin themodel.If wewereto storethemodelin acompactbinaryformat,theinputportionof thepre-processingwould likely be reducedconsiderably. Theoctreeconstruction,geometryassignment,and initial solidity com-putationonly required34, 64, and34 seconds,respectively,on eachof thethreemachines,andcouldlikely bereducedbycarefullyoptimizingourcode.

Rendering results Wepresentour mainrenderingresultsfor the variouscPLPimplementationsin Fig. 6. Theverticalaxisrepresentstheaveragerenderingtime for eachof the500stepsin thepathgeneratedfor the testmodel. Thehorizontalaxis representsthe initial budgetusedby PLPto renderwhatit determinedto be the most likely visible geometry, thereby

preloadingthecoloranddepthbuffers.If we comparethe item-buffer and histogramtechniques,

we seethat the item-buffer is considerablyfasteron eachofthe SGI machines. All of theseruns2 tendedto reachtheirminimum valuesfor an initial PLP budgetof 25K triangles,or roughly 2.5% of the total numberin the model. For thisbudget,the renderingtimes for the item-buffer techniqueontheOctaneandOnyx were0.081and0.113secondson aver-ageperframe. This is equivalentto rendering12.35and8.85framespersecond,respectively. In comparison,thehistogramapproachtook 0.164and0.178secondsonaverageperframe,or theequivalentof 6.10and5.62renderedframespersecond.

We did not run cPLP-HGon theKayaksincetheOpenGLhistogramextensionis not availableon that machine. Also,thecPLP-IBtechniqueon theKayakwasvery slow, requiring0.864secondson averageper frame. We explain why this isthecasewhenwe discussthecostsof theprimitive operationsfor eachof thetechniquesbelow. TheHP hardwareocclusionculling extensionwasclearlynotavailableontheSGIs,andsowecanonly reporton this techniqueon theKayak.

cPLP-HPwas the mostefficient algorithm but we werealittle surprisedby the fact that it increasedin runningtime aswe increasedthe PLP budget. We anticipatedthat we wouldseea paraboliccurve similar to the runson the two SGI ma-chines.Initially, we consideredthatrunningPLPfollowedbyourcPLP-HPvisibility querieswasnotbenefittingusat all onthe Kayak. To test this hypothesis,we implementedanothertechnique,HP, that usedthe hardware occlusionculling ex-tensionwithout thebenefitof runningPLPfirst to preloadthedepthbuffer. Given the set of leaves in our octree,we firstdiscardedthosenodesthatwereoutsidetheview frustum,andthen sortedthe remainingnodesaccordingto their distancefrom theviewpoint. We thenperformedvisibility queriesforthenodesin thisorder. Onaverage,theHPtechniquerequired0.157secondsper frame,which is considerablyslower thanourcPLP-HPalgorithm.

While sortingthe nodesaccordingto distanceappearedtobe a goodtechnique,it clearly cannotcaptureany occlusioninformationasdid cPLP. In addition,this HP techniquedoesnot have a mechanismfor determiningwhich nodesarestillvisible andwhich sectionsof the screenareyet incomplete.Consequently, this methodcannoteasily determinewhen itis finished,andthereforemustperformmany morevisibilityqueriesthanthecPLP-HPtechnique.Onecouldthink of mod-ifying this HP approachsothatthequeriesareperformedin ahierarchicalfashionsincewe have theoctreeconstructedany-way. However, while in somecasesthiscouldreducetheover-all renderingtime, in many othersthetimeswill increasedueto the increasein the numberof visibility queries. We shalldiscussshortlythetimesrequiredfor theHPvisibility queries.Thus,althoughthebenefitgainedfrom PLPwasnotexactlyasweanticipated,it still playsacrucialrole in achieving interac-tive renderingtimes.

To quantifyhow well our conservative culling algorithmisworking, we implementeda simplerenderingalgorithm,VF-BF, thatperformedonly view frustumandback-faceculling.Thesetraditional culling approacheswere also usedwithincPLP. TheVF-BF algorithmis considerablyslower thanall ofthecPLPimplementations.For example,on theOctane,VF-BF took0.975secondsto rendereachframeonaverage.Thus,our cPLP-IB andcPLP-HGmethodsrenderframes12 and6

2The only exceptionbeing the OctanecPLP-HGmethod,whichreachedaminimumataPLPbudgetof 50K triangles.

8

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 9

Machine SGIOctane SGI Onyx HPKayakImageSize 642 5122 642 5122 642 5122

Transfer 217 4483 564 7733 375 11250Scan 30 2300 20 1000 47 3430

Total 247 6783 584 8733 422 14680

Table2: Timesfor theprimitive operationsof the item-buffertechnique. An imagesize of 642 refersto an imagethat is64x64pixels in size.Thetransfertime is thedominantcostofthis method.All timesarereportedin microseconds.

timesfasterthantheVF-BF technique.Our cPLP-HPmethodprovidesevenbettercomparisons.Suchimprovementsin ren-deringspeeds,which weresimilar on all of thearchitectures,arecrucialfor any applicationrequiringinteractivity.

Of the time spentby our cPLPapproaches,a goodportionof that time wasactuallyspentrunningthe initial PLP algo-rithm. For example,on theOctane,outof the0.081secondsittakesto rendera frameon average,0.064secondswerespentby thePLPalgorithm.

Primitive Operation Costs To betterunderstandtheren-deringtimesreportedin Fig. 6, we analyzedthe costof per-forming the underlyingprimitive operationsfor eachof themethods.By looking at theseresults,we canoffer additionalinsight into why eachof the methodsworks as well, or aspoorly, asit does.

For the cPLP-HPtechnique,the visibility queriesinvolveenablingtheHP culling extension,renderinga cell, andread-ing backtheflag to indicatewhetherthez-buffer would havechangedif we hadactually renderedthe cell. We timed thevisibility querieson the HP Kayak and found that the timerangedbetween100microseconds(µs) and1000µs. In addi-tion to thesecosts,the HP visibility querycanalsointerruptthe renderingpipeline,therebyreducingthe overall through-put. Consequently, it is imperativewhenusingthesequeriestodo sowith somecaution. It is especiallyadvantageouswhenyou arevery likely to find significantocclusion. Otherwise,many queriesmaybewastedandtheoverall renderingperfor-mancewill bereduced.

Theprimitive operationfor theitem-buffer techniqueis thetransferringof thecolor buffer from thegraphicsacceleratorsmemoryto the main memoryof the host computer. This isdonein OpenGLusingasinglecall to glReadPixels.Theothermaincostassociatedwith this techniqueis thetime it takestheCPU to scanthe color buffer to determinewhich cells haveactually contributed to the image. We report thesenumbersfor eachof our machinesin Table 2. It is immediatelyap-parentwhy the cPLP-IB techniqueon the Kayak is so slow.The transferandscantimesareconsiderablyslower (for the515x512image)than on the SGIs. Another interestingob-servation, which alsohelpsjustify our tiling optimizationinSection3.4, is thesubstantialincreasein time that is requiredto transferandscana512x512pixel image,asopposedto onlya 64x64pixel (sub)image.

For thosemachinesthatsupporttheOpenGLhistogramex-tension,theunderlyingoperationsincludecopying an image,or sub-imagein thecaseof our tiles, from the framebuffer totexturememory. We have timedthis operationbothwith andwithoutthehistogramextensionenabledto seehow muchtimeis requiredfor the copy alone,as well as for the copy withthe histogramcalculations. The histogramcalculationalso

Figure7: Interior view of a skyscrapermodel. cPLPreducedthedepthcomplexity of this renderedimagefrom 26 to 8.

includesthe time to retrieve andscanthe histogramresults.On the Octaneit takes 800µs for a 64x64 pixel image,and34000µsfor a512x512image.OntheOnyx, it takes690µsfora 64x64pixel image,and13500µs for a 512x512image.(Weshouldnotethat it is quitedifficult to performsuchmeasure-ments,but we have doneour bestto reportaccurateresults.)Weweresurprisedby theamountof timerequiredto copy theimageto texturememoryandperformthehistogramcomputa-tions. Our initial belief wasthatby usingtheactualhardwareto performour visibility queries,our renderingtimes woulddecrease.Unfortunately, this is not the caseat this point intime. While the Onyx appearsto be moreadvancedthanthe(newer)Octanein its histogrammingfeatures,neithermachineperformswell enoughto be fasterthan the item-buffer tech-niques.

Depth Comple xity To further test our cPLP algorithms,weconsideredanothermodelwith extremelyhighdepthcom-plexity. Fig. 7 shows an interior view of a skyscrapermodelwhichconsistsof overonemillion triangles.Themodel,cour-tesyof Ned Greene,consistsof 54 copiesof a module,eachwith almost20K triangles.

Thepurposeof this experimentwasto determinethedepthcomplexity of this modelwhenrenderingit usingthevarioustechniques.By depthcomplexity, we referhereto theaveragenumberof times a z-test is performedfor eachpixel in theimage. If our cPLP techniquesare effective at determiningocclusion,our methodsshouldreducethe depthcomplexityconsiderablyin comparisonto astandardrenderingalgorithm.Using onesuchtechnique,VF-BF, we determinedthe depthcomplexity of this model (for this viewpoint) to be 26.70onaverage,for all of the pixels in the image. Using cPLP, wewere able to reducethis value to only 7.97. We emphasizethat thesenumbersrefer to thenumberof z-testsperpixel, asopposedto the numberof z-teststhat pass,which hasbeenreportedin otherapproaches.We optedfor this numbersincethenumberof z-testsmoreaccuratelyreflectthework that isdoneduringtherenderingalgorithm.

9

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 10

PLPBudget(triangles) PLPTime (s) # EXT Tests Avg. ExtraTriangles AverageTime (s) FrameRate(Hz)1,000 0.019 4.688 16844 0.044 22.710,000 0.028 3.376 10978 0.045 22.225,000 0.043 2.426 5641 0.053 18.950,000 0.066 1.908 2796 0.072 13.975,000 0.091 1.630 1770 0.096 10.4100,000 0.112 1.372 1247 0.116 8.6

Table3: Performanceof cPLP-EXTon a “hypothetical” HP Kayak fx6. All timesarereportedin seconds.The averageextratrianglesarethenumberof trianglesthatgetrenderedin additionto thePLPbudget.Seetext for furtherdetails.

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13

0.14

0 20 40 60 80 100

Ave

rage

Ren

derin

g T

ime

(sec

onds

)

PLP Triangle Budget (thousands)

Kayak cPLP-HPKayak cPLP-EXT

Figure 8: Averagerenderingtimes per frame for cPLP-HPandourproposedhardwareextensionmethodcPLP-EXT. ThePLPbudget,reportedin thousandsof triangles,determinesthenumberof trianglesinitially renderedto fill-in thedepthbuffer.

cPLP-EXT Sincewe do not actuallyhave hardwarewhichimplementsourproposedextension,hereweextrapolateonitsperformancebasedon theresultswe have, assumingwe wereto addsuchan extensionto theHP Kayak fx6. Using cPLP-IB, it is possibleto determinethenumberof teststhatcanbeperformedin parallel for eachtrianglebudgetin Fig. 6. As-sumingour extensionis properlyimplemented,we believe itshouldtake nomoretime thanthefragmentingcountingtech-niquealreadyavailableon several architectures.While mea-suringon HP machines,we found that in the worst case,anocclusiontestcosts1 ms. But sincewe have to bring moredatafrom the graphicshardware, we will assumehereeachqueryis twice asexpensive, or 2 ms, to accountfor theextradatatransfer. (Sinceonly extremelysmallarraysof 256valuesarebeingtransfered,we don’t actuallybelieve it would havesuchanimpact.)

Table3 summarizesour findings. Basically, we arecom-putingthetime for cPLP-EXTasasumof theinitial PLPcost(initialize its per-framedatastructures,suchaszeroingtheso-lidity of eachcell; andrenderingthefirst batchof trianglesforall frames),plusthetotalnumberof parallelEXT tests(whichwe assumetake 2 mseach),plusthetime to renderingtheex-tratriangles(atarateof approximately1 million triangles/sec)whicharefoundasvisibility testsareperformed.

With theseassumptions,we can seethat our frame rates

getconsiderablybetter(seeFig. 8), andwe couldpotentiallyachieve a framerateof 23 Hz (versus18 Hz for cPLP-HP;animprovementof 28%) if we hada hardware implementationof our extension. We would like to point out that the advan-tagewouldbeevengreaterif thecostof initializing PLP’sper-framedatastructureswasmadelower. OurcurrentPLPimple-mentationusesanSTL set,whichis notparticularlyoptimizedfor linear traversalswhich arenecessaryduring initialization.If necessary, it wouldbepossibleto optimizethiscodefurther.

5 Related Work

Therehasbeena substantialamountof recentwork on occlu-sionculling (see,for instance,[5, 6, 8, 11, 18, 23,24, 30, 31]).The purposeof this sectionis not to do an extensive reviewof all occlusionculling algorithms.For that,we refer the in-terestedreaderto the recentsurveys by Cohen-Oret al. [7]andDurand[10]. Instead,we focuson reviewing work that ismorecloselyrelatedto our own, so that we canindicatethesimilaritiesanddifferenceswith our currentwork.

Closely relatedto our work are techniquesthat usetwo-dimensionaldepth information to avoid renderingoccludedgeometry. An early example of this is a techniquebyMeagher[20] thatstoresthescenein anoctree,andtheframe-buffer in a quadtree. Meagherrendersthe octreein a strictfront-to-backorder, while keepingtrackof which partsof thequadtreegetfilled, in orderto avoid touchingpartsof theoc-treethatcannotbeseen.Naylor [22] proposesanotherversionof this idea,whereinsteadof usinganoctreeanda quadtree,heusestwo binary-spacepartitioningtrees[12], onein 3D, theotherin 2D, to efficiently keepboth the sceneandthe imagerespectively. The3D BSPcanbeusedto traversethesceneinastrict front-to-backorder, andthe2D BSPis usedto keeptheareasof the screenwhich get filled. Our currentapproachesdiffer from thesemethodsin thatthey do not renderin a strictfront-to-backorder(whichwasshown to belesseffective),butratherallow PLPto determinetheorderin which to visit (andrender)thecells.

The HierarchicalZ-Buffer (HZB) techniqueof Greeneetal. [14] is probablythebestknown exampleof atechniquethatefficiently usesdepthinformationfor occlusionculling. Theirtechniqueis relatedto Meagher[20] in that it also usesanoctreefor managingthescene,which is renderedin front-to-backorder. Anothersimilarity is thatthey alsouseaquadtree,but not for the actualframebuffer (as in [20]). Instead,theyusethequadtreeto storethez-buffer values,which allow forfastrejectionof occludedgeometry. TheHZB techniquealsoexplorestemporalcoherency by initializing the depthbufferwith thecontentsof thevisiblegeometryin thepreviousframe.

10

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 11

TheHierarchicalZ-Buffer hasseveralsimilaritiesto cPLP.Their useof thevisible geometryfrom thepreviousframeforthepurposeof estimatingthevisiblegeometryis similar to ourapproach,althoughin ourcase,weusethevisibility estimationpropertiesof PLP to estimatethe currentframe. Oneadvan-tageof doingit this way is that(aswe have shown earlier)thefront intrinsically tells uswhereto continuerenderingto fill-up thez-buffer. HZB hasno suchinformation; it renderstheremaininggeometryin front-to-backorder. The fact that weemploy a spatialpartitioninginsteadof a hierarchyin object-spaceis only a minor difference.Dependinguponthe sceneproperties,this mayor maynotbeanadvantage.Theflat datastructurewe useseemsmoreefficient for a hardware imple-mentation,sincewe do not needto stopthepipelineasoftento determinethe visibility of objects. In [13], Greeneintro-ducesanoptimizedvariationof theHZB technique,includinga non-conservative mode.

A closely relatedtechniqueis the HierarchicalOcclusionMaps of Zhanget al. [33]. For eachframe, objectsfrom aprecomputeddatabasearechosento beoccluders,andareren-dered(possibly) in lower resolutionto get a coveragefoot-print of thepotentialoccluders.Using this image,OpenGL’stexturemappingfunctionalitygeneratesa hierarchyof image-spaceocclusionmaps,which are thenusedto determinethepossibleocclusionof objectsin the scene. Note that in thistechnique,thedepthcomponentis consideredafter it is deter-minedthatanobjectcanpotentiallybeoccluded.Oneof themaindifferencesbetweenHOM andcPLPis thatHOM relieson preprocessingthe input to find its occluders,while cPLPusesPLPfor thatpurpose.HOM alsoutilizesa strict front-to-backtraversalof theobject-spacehierarchy.

Thework by Bartzetal. [3, 4] addressesseveralof thesamequestionswe do in this paper. They provide anefficient tech-niquefor implementingocclusionculling usingcoreOpenGLfunctionality, and then proposea hardware extensionwhichhasthepotentialto improve performance.Similar to thepre-viousmethods,Bartzetal. useahierarchyfor the3D scene.Inorderto determinethevisible nodes,they first performview-frustumculling, which is optimizedby usingtheOpenGLse-lectionmodecapabilities.For theactualocclusiontests,whichareperformedtop-down in thehierarchynodes,they proposeto usea virtual occlusionbuffer, which is implementedusingthestencilbuffer to save theresultsof whena givenfragmenthaspassedthe z-test. In their technique,they needto scanthe stencil buffer to perform eachvisibility test. Sincethishasto beperformedseveral timeswhendeterminingthevisi-ble nodesof a hierarchy, this is themosttime consumingpartof their technique,andthey proposeanoptimizationbasedonsamplingthevirtual occlusionbuffer (thusmakingtheresultsonly approximate). In their paper, they alsoproposean ex-tensionof the HP occlusionculling test [25] (see[3] for de-tails). At this time, theHP occlusiontestsimply tells whethera primitive is visible or not. Bartzet al. proposeanextensionto includemoredetail,suchasnumberof visible pixels,clos-estz-value,minimal-screenspaceboundingbox, etc. Thereareseveraldifferencesbetweentheir work andour own. Firstand foremost,our techniquesare designedto exploit multi-ple occlusionqueriesat one time, which tend to generateasmallernumberof pipelinestalls in the hardware. Also, ourhardwareextensionis moreconservative in its corefunction-ality, but hasthe extra featurethat it would supportmultiplequeries. Oneadditionaldifferenceis that, similar to Greeneetal. [14], cPLPincorporatesaneffective techniquefor fillingup thedepthbuffer soasto minimize thenumberof queries.

We do not believe that it would bedifficult to incorporatethisfeaturewithin theframework of Bartzetal.

The techniqueby Luebke and Georges [19] describeascreen-basedtechniquefor exploiting “visibility portals”,thatis, regionsbetweencells which canpotentially limit visibil-ity from oneregion of spaceto another. Their techniquecanbe seenasa dynamicway to computeinformationsimilar tothat in [28]. Onecanthink of cPLP’s obscuredfront asa sin-gle occluder, which hasa few holes. If we think of theholesas“portals”, this is in certainrespectsanalogousto theworkof Luebke andGeorges. In the context of their colonoscopywork, Hongetal. [15] proposeatechniquewhichmergesLue-bkeandGeorges’sportalswith adepth-buffer basedtechniquesimilar to ours. However, in their work, they exploit thespe-cial propertiesof thecolonbeinga tube-like structure.

HyperZ[21] is aninterestinghardwarefeaturethathasbeenimplementedby ATI. HyperZhasthreedifferentoptimizationsthat improve the performanceof 3D applications.The mainthrustof theoptimizationsis to lower thememorybandwidthrequiredfor updatingthez-buffer, whichthey reportis thesin-gle largestuserof bandwidthon their graphicscards.Oneop-timizationis a techniquefor losslesscompressionof z-values.Anotheris a fastz-buffer clear, whichperformsa lazyclearofthedepthvalues.ATI alsoreportsonanimplementationof thehierarchicalz-buffer in hardware. Detailson the actualfea-turesareonly sketchyandATI hasnot yet exposedany of thefunctionalityof their hardwareto applications.Consequently,it is not possibleat this point to exploit their hardwarefunc-tionality for occlusionculling.

Another recent techniquerelated to the hierarchicalZ-buffer is describedby Xie andShantz[32]. They proposetheAdaptiveHierarchicalVisibility (AHV) algorithmasasimpli-ficationof HZB for tile architectures.

AlonsoandHolzschuch[2] proposea techniquewhich ex-ploits thegraphicshardwarefor speedingup visibility queriesin the context of global illumination techniques.Their tech-niqueis similar to our item-buffer technique.Westermannetal. [29] proposea different techniquefor using the OpenGLhistogramfunctionality for occlusionculling. Their work in-volveshistogrammingthe stencilbuffer, insteadof the colorbuffer asdonein ourwork.

6 Conc lusions

In thispaperwepresentedanovel conservative visibility algo-rithm basedon thenon-conservative PLP algorithm. Our ap-proachexploitsseveralfeaturesof PLPto quickly estimatethecorrectimage(anddepthbuffer) andto determinewhich por-tionsof this estimationwereincorrect. To completeour con-servativeapproach,werequiredanefficientmeansof perform-ing visibility querieswith respectto thecurrentestimationim-age.We showedhow to implementthesevisibility queriesus-ing eitherhardwareandsoftware. If fragment-countinghard-ware is available (suchas on HP fx, DiamondFireGL, SGIIR3), this is clearlythebestchoice.Otherwise,theitem-buffertechniqueis thenext bestoption. As graphicshardwarecon-tinuesto improve,andif theOpenGLhistogrammingfeaturesarefurtheroptimized,thisapproachmayoffer thehighestlev-elsof interactive rendering.

Our cPLPapproachhasseveral nice features. It providesa muchhigherlevel of interactivity thantraditionalrenderingalgorithms,suchasview frustumculling. As opposedto PLP,cPLPprovidesaconservative visibility culling algorithm.The

11

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 12

preprocessingrequiredby our algorithmis very modest,andwe arenot requiredto storesignificantocclusioninformation,suchasview-dependentoccludersor potentiallyvisible sets.We arealsoableto run our algorithmon all (polygonal)datasetssincewedonotrequireany underlyingstructureor format,suchasconnectivity information.

Further investigationis necessaryto study the feasibility(cost) of adding our hardware extension proposedin Sec-tion 3.6 to currentarchitectures.As we show in this paper, itcanfurther improve the performancesubstantiallyover tech-niquesthatprovidea singlecounterof thefragmentsthatpassthe depth-test,such as the HP occlusion-cullingextension,sinceit is ableto performseveral testin parallel.

Ackno wledg ements

WethankCraigWittenbrinkfor helpwith theocclusioncullingcapabilitiesof the HP fx seriesaccelerators.Craig providedus with muchneededmaterial,including documentationanda sampleimplementationthat showed us how to usethe HPocclusiontest.Many thanksto Prof. CarloSequinandhisstu-dentsat theUniversity of California,Berkeley for theSODAHall modelusedin our experiments.Thanksto Ned Greenefor providing us with the Skyscraperdataset,andfor helpingin tracking down hard to find references.Thanksto DavidKirk of Nvidia for discussionsaboutthecomplexity of addingtheper-fragmenthistogrammingextension.Finally, we thankDirk Bartzfor commentsonanearlierversionof this paper.

References

[1] OpenGL histogramdocumentation. http://www.open-gl.org/developers/documentation/Version1.2/1.2specs/-histogram.txt.

[2] L. AlonsoandN. Holzschuch.Usinggraphicshardwareto speed-upyour visibility queries.Journal of GraphicsTools, to appear.

[3] D. Bartz, M. Meißner, and T. Huttner. Extendinggraphicshardware for occlusion queries in OpenGL.1998SIGGRAPH/ EurographicsWorkshopon Graph-ics Hardware, pages97–104,August,1998.

[4] D. Bartz,M. Meißner, andT. Huttner. OpenGL-assistedocclusionculling for largepolygonalmodels.Computers& Graphics, 23(5):667–679,October, 1999.

[5] F. Bernardini, J. T. Klosowski, and J. El-Sana. Di-rectionaldiscretizedoccludersfor acceleratedocclusionculling. ComputerGraphicsForum, 19(3):507–516,Au-gust,2000.

[6] Y. Chrysanthou,D. Cohen-Or, andD. Lischinski. Fastapproximatequantitative visibility for complex scenes.ComputerGraphics International ’98, pages220-229,June,1998.

[7] D. Cohen-Or, Y. Chrysanthou,andC. Silva. A Surveyof Visibility for WalkthroughApplications. Submittedfor publication, 2000. Also in “Visibility, problems,techniques,andapplications”,ACM SIGGRAPH2000Course#4,2000.

[8] D. Cohen-Or, G. Fibich, D. Halperin,andE. Zadicario.Conservative visibility and strongocclusionfor views-pacepartitioningof denselyoccludedscenes.ComputerGraphicsForum, 17(3):243–254,1998.

[9] R. Cunniff. Visualizefx graphicsscalablearchitecture.In Hot 3D Proceedings, GraphicsHardware Workshop2000,Interlaken,Switzerland,August,2000.

[10] F. Durand. 3D Visibility: AnalyticalstudyandApplica-tions. PhDthesis,UniversiteJosephFourier, Grenoble,France,July, 1999.

[11] F. Durand,G. Drettakis,J. Thollot, andC. Puech.Con-servative visibility preprocessingusingextendedprojec-tions.Proceedingsof SIGGRAPH2000, pages239–248,July, 2000.

[12] H. Fuchs,Z. M. Kedem,andB. Naylor. On visible sur-facegenerationby a priori treestructures.Proceedingsof SIGGRAPH1980, pages124–133,1980.

[13] N. Greene. OcclusionCulling with OptimizedHierar-chicalBuffering. In Proc.ACM SIGGRAPH’99SketchesandApplications, page261,August,1999.

[14] N. Greene,M. Kass,andG.Miller. HierarchicalZ-buffervisibility. In ComputerGraphicsProceedings,AnnualConferenceSeries,1993, pages231–240,1993.

[15] L. Hong, S. Muraki, A. Kaufman, D. Bartz, and T.He. Virtual voyage: Interactive navigation in the hu-mancolon.Proceedingsof SIGGRAPH97, pages27–34,1997.

[16] J.T. Klosowski andC. T. Silva. Renderingon a budget:A framework for time-critical rendering. IEEE Visual-ization’99, pages115–122,October, 1999.

[17] J.T. Klosowski andC. T. Silva. Theprioritized-layeredprojection algorithm for visible set estimation. IEEETransactionson Visualizationand ComputerGraphics,6(2):108–123,April - June,2000.

[18] V. Koltun, Y. Chrysanthou,and D. Cohen-Or. Virtualoccluders:An efficient intermediatepvs representation.RenderingTechniques2000: 11th EurographicsWork-shopon Rendering, pages59–70,June,2000.

[19] D. Luebke andC. Georges.Portalsandmirrors: Simple,fastevaluationof potentiallyvisible sets. In 1995ACMSymposiumon Interactive3D Graphics, pages105–106,1995.

[20] D. Meagher. Efficientsyntheticimagegenerationof arbi-trary3-dobjects.In Proceedingsof IEEEConferenceonPatternRecognition and Image Processing, pages473–478,June,1982.

[21] S.Morein. ATI RadeonHyper-Z technology. In Hot 3DProceedings, GraphicsHardwareWorkshop2000,Inter-laken,Switzerland,August,2000.

[22] B. F. Naylor. Partitioningtreeimagerepresentationandgenerationfrom 3D geometricmodels. In Proceedingsof GraphicsInterface’92, pages201–212,May, 1992.

12

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 13

[23] C.Saona-Vazquez,I. Navazo,andP. Brunet.Thevisibil-ity octree:A datastructurefor 3dnavigation.ComputersandGraphics, 23(5):635–643,1999.

[24] G. Schaufler, J. Dorsey, X. Decoret,and F. X. Sillion.Conservative volumetricvisibility with occluderfusion.Proceedingsof SIGGRAPH2000, pages229–238,July,2000.

[25] N. Scott,D. Olsen,andE. Gannet.An overview of thevisualizefx graphicsacceleratorhardware.TheHewlett-Packard Journal, May:28–34,1998.

[26] K. Severson. VISUALIZE WorkstationGraphicsforWindows NT. HP productliterature.

[27] Silicon Graphics, Inc. SGI Visual Worksta-tion OpenGL Programming Guide for Win-dows NT. Document Number 007-3876-001.https://www.sgi.com/developers/nt/sdk/files/-OpenGLEXT.pdf

[28] S. J. Teller and C. H. Sequin. Visibility preprocess-ing for interactive walkthroughs.In ComputerGraphics(SIGGRAPH’91 Proceedings), volume25,pages61–69,July, 1991.

[29] R. Westermann,O. Sommer, and T. Ertl. DecouplingPolygonRenderingfrom GeometryusingRasterizationHardware.Unpublishedmanuscript,1999.

[30] P. WonkaandD. Schmalsteig.Occludershadows for fastwalkthroughsof urbanenvironments.ComputerGraph-ics Forum, 18(3):51–60,September, 1999.

[31] P. Wonka, M. Wimmer, and D. Schmalstieg. Visibil-ity preprocessingwith occluderfusion for urbanwalk-throughs.RenderingTechniques2000:11thEurograph-ics Workshopon Rendering, pages71–82,June,2000.

[32] F. Xie andM. Shantz. Adaptive hierarchicalvisibilityin a tiled architecture.1999SIGGRAPH/ EurographicsWorkshoponGraphicsHardware, pages75–84,August,1998.

[33] H. Zhang, D. Manocha,T. Hudson, and K. E. HoffIII. Visibility culling usinghierarchicalocclusionmaps.Proceedingsof SIGGRAPH97, pages77–88,1997.

13

EfficientConservative Visibility Culling UsingThePrioritized-LayeredProjectionAlgorithm 14

(a)

(b) (c) (d) (e)

Figure9: (a) A top-down view of ourdataset.(b)–(e)Sampleviews of therecordedpath.

(a) (b) (c)

(d) (e) (f)

Figure10: Snapshotsduring threeiterationsof our cPLPalgorithm. Thecurrentfront (blue) andcompletedtiles (red)arehigh-lightedfor iteration1 in (a)and(d), iteration2 in (b) and(e),anditeration3 in (c) and(f). Thefinal renderedimageis (c).

14