M arch h of the Frob blins s: - AMD Developer Central

50
A N 5 4 5 6 7 Advances in Re N. Tatarchuk (E 52| Page Sim Cr jeremy.shopf@ josh.barczak@ chris.oat@am natalya.tatarch ealͲTime Rende Editor) M mula rowd Figure @amd.com @amd.com md.com h[email protected]m ering in 3D Gra arch ation s of Cre 1. Froblins m Jeremy Shopf 4 Gam aphics and Gam h of and Intel eatur navigate to Joshua Barczak 5 me Comput A mes Course – S the Rend lligen res o a mushroom Christop Oat ting Applic AMD, Inc. IGGRAPH 2008 Frob derin nt an on GP m patch goal pher 6 N Tat cations Gro 8 blins ng Ma nd De PU l and harves Natalya tarchuk 7 oup Cha s: assiv etaile t food. apter 3 ve ed

Transcript of M arch h of the Frob blins s: - AMD Developer Central

AN

5

4

5

6

7

AdvancesinReNTatarchuk(E

52|P a g e

SimCr

jeremyshopf joshbarczak chrisoatam natalyatatarch

ealͲTimeRendeEditor)

Mmularowd

Figure

amdcom amdcom mdcom

hukamdcom

eringin3DGra

archation s of

Cre

1 Froblins

m

JeremyShopf4

Gam

aphicsandGam

h of and Inteleatur

navigate to

JoshuaBarczak5

meComputA

mesCoursendashS

the Rendlligenres o

a mushroom

ChristopOat

tingApplicAMDInc

IGGRAPH2008

Frobderinnt anon GP

m patch goal

pher6

NTat

cationsGro

8

blinsng Mand DePU

l and harves

Natalyatarchuk7

oup

Cha

s assiv

etaile

t food

apter 3

ve ed

Chapter 3 March of the Froblins

53|P a g e

31 BeyondPrettyPicturestowardIntelligentInteractiveExperiences Artificial intelligence(AI) isgenerallyconsideredtobeoneofthekeycomponentsofacomputergameSometimeswhenweplayagamewemaywish that thecomputeropponentswerewrittenbetterAtthose times while playing against the computer we feel that the game is unbalanced Perhaps thecomputerplayerhasbeengivendifferentsetof rulesoruses thesame rulesbuthasmore resources(healthweapons etc) The complexity of underlying AI systems alongwith game design belies theresulting feelingwehavewhenplayinganygameAs theCPUandGPUspeedandpowercontinues togrowalongwithincreasingmemoryamountsandbandwidthgamedevelopersareconstantlyimprovingthegraphicsof theirgames In the last fiveyears theproductionqualityofgameshasbeen increasing(alongwiththecorrespondingbudgets)RecentgameswooplayerswithincrediblebreakthroughsinrealͲtime3DgraphicscomplexityoftheworldsandcharactersaswellasvariouspostͲprocessingeffectsAndwhile there had been tremendous improvements for parallelizing rendering through the evolution ofconsumerGPUpipelinesartificialintelligencecomputationsaretreadingbehindTodatetherehadbeenratherfewattemptsatparallelizingAIcomputationsTypicallyinagameAIcontrolsthebehaviorofnonͲplayerͲcharacters(NPC)whethertheyarefriendlytotheplayeroractasgameopponentsThismay includeactualcharactersor itcansimplybetanksandarmies(suchasinarealͲtimestrategygame)ormonstersinafirstͲpersonshooterTheuniformfeelingisthebettertheAIisthebetterthegameAmoresophisticatedAIsystemallowsformoreinterestingandfungameplayArtificial intelligence isused forvariouspartsof thegameTypicalcomputations includepathfindingobstacleavoidanceanddecisionsmakingThesecalculationsareneededregardlessofthegenre of interactive entertainment be it a realͲtime strategy game anMMORPG or a firstͲpersonshooterWewillalsosoonseedynamiccharacterͲcentricentertainmentintheformofinteractivemovieswheretheviewerwillhavecontrolovertheoutcomeInmany scenarios the AI computations include dynamic path finding This involves autoͲsimulatingcharactersrsquobehaviorandorrunningaterrainanalysistoidentifygoodorupdatevalidpathsasresultofgameplayThesecomputationscanbequiteahogonCPUtimebudgeteveninmultiͲcorescenariosAsaresultmanygamedevelopersarelookingforwaystominimizetheCPUhitofpathfindingBecausepathfindingandAIingeneralissuchacomputeͲintensiveexpensivecalculationweoftenseeboringzombieͲlike NPC interaction Furthermore when gameplay and physics are simulated on the CPU and thecharactersare renderedon theGPU there isanadditionalPCIͲEdata transferoverhead for characterpositionsandstateIfwecanutilizetheGPUforrunninginͲgameAIcodenotonlywecanspeeduppathfindingbutwecanalsointroduceanumberofotherinterestingeffectsOurcharacterscanstartlivingontheirownresultinginsoͲcalledldquoemergentbehaviorsrdquondashsuchaslaneformationqueuingandreactionstoothercharactersandsoonAndthismeansthatgameplaywillbealotmorefunWith this inmindwe setout toexplore thenextvisual frontier for interactiveexperience combiningmassivelyparallelAIcomputationswithhighfidelityrenderingalgorithmsInthischapterwewilldescribethesimulationandrenderingmethodsusedfortheAMDFroblinsdemodesignedtoshowcasemanyofthe new approaches for characterͲcentric entertainment These techniques aremade possible by themassivelyparallelcomputeavailableonthelatestcommodityGPUssuchasATIRadeonregHD4800seriesIn our fantasy largeͲscale environment with thousands of highly detailed intelligent characters the

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

54|P a g e

Froblins (frog goblins) are concurrently simulated animated and rendered entirely on the GPU Theindividualcharacter logicforeachfroblincreature iscontrolledviaacomplexshader(over3200shaderinstructions)We are utilizing the latest functionality availablewith the DirectXreg 101 API hardwaretessellation high fidelity rendering with 4X MSAA settings at HD resolution with gammaͲcorrectrendering full high dynamic range FP16 pipeline and advanced postͲprocessing effects The crowdbehavior simulation is performed directly on theGPUWewill describe aGPUͲfriendly pathͲplanningframeworkforlargeͲscalecrowdsimulationThisframeworkcanalsobeusedtosimulatelargercrowdsofsimplifiedagentswithsmallerpolygonalcountOursystemhasbeenusedtosimulate65000agentsatrealͲtimeframeratesonasinglecommodityGPUBycombiningacontinuumͲbasedglobalpathplannerwithafineͲgrainedagentͲbased localavoidancemodelwecanperformexpensiveglobalplanningatacoarseresolutionand lowerupdateratewhile the localmodel takescareofavoidingotheragentsandnearby obstacles at a higher frequency To our knowledge this is the firstmassive crowd simulationperformedentirelyonaGPUInourinteractiveenvironmentwerenderthousandsofanimatedintelligentcharactersfromavarietyofviewpoints ranging from extreme closeͲups (with individual characters rendered at over 16 milliontriangles forcloseͲupdetail) to farawayldquobirdrsquoseyerdquoviewsof theentiresystem (over three thousandscharacters at the same time) Our system combines stateͲofͲtheͲart parallel artificial intelligencecomputationfordynamicpathfindingandlocalavoidanceontheGPUmassivecrowdrenderingwithLODmanagementwithhighendrenderingcapabilitiessuchasGPUtessellationforhighqualitycloseͲupsandstableperformance terrain systemcascaded shadows for largeͲrangeenvironmentsandanadvancedglobal illumination systemWe are able to render ourworld at interactive rates (over 20 fps on ATIRadeonregHD4870)withstaggeringpolygoncount(6ndash8milliontrianglesonaverageat20Ͳ25fps)whilemaintainingthefullhighqualitylightingandshadowingsolution

32 ArtificialIntelligenceonGPUforDynamicPathfinding321 GlobalPathfindingManysystemsforcrowdsimulationsrelyonagentͲbasedsolutionswherethemovementiscomputedforindividualagentsseparatelyWhiletherearecertainadvantagestothisapproach(independentdecisionsfor each agent individual visibility and environment information) it is also challenging to developbehavioralrules fortheagentsthatresult inaconsistentandrealisticoverallcrowdmovementAtthesame time scalingagentͲbasedmethods fora largenumberofagents isprohibitively computationallyexpensivewhichisaconcernforinteractivescenariossuchasvideogamesFor our crowd simulation solution we chose a continuumͲbased approach similar to the ContinuumCrowds work by Treuille et al [TCP06] Thismethod convertsmotion planning into an optimizationproblem usingwellͲknown numericalmethods from optics and general physics for stable navigationsolutionThis type ofmethod is particularly well suited for simulating large numbers of agents because it iscomputed spatially instead of perͲagent and results in smooth movement with no ldquodeadͲendsrdquoAdditionallyacontinuumapproachresults inflowͲlikemovementwhich ischaracteristicofactual large

C

cdmIrtinTa

wisc

Fmtt

Chapter 3 Ma

crowdsThedecision logmethodspro

nthiscontinreferred toaterrainslopen the envirneighboring

Thiscost funanylocation

where isܨ tntuitive toshortestͲpatcomputinga

Figure 2 Emovement dithat the arrothreat

arch of the Fr

globalmodgic and theoducesmoot

nuumͲbasedasa speedeetc)andaronment Thlocationand

nction isthetotheneare

the positivesee how thh from evepropagating

Example of irections Asows near th

oblins

elisonlyanerefore it isthandrealis

dcrowdsimufunction)Tavoidancefahis cost fundisusedtoe

enusedas iestgoalThi

eͲvalued coshe functionry locationgwavefront

dynamic pas the large ge ldquomonster

napproximaaugmented

sticcrowdm

ulationtheThis cost funactor(basednction descevaluatethe

nputtoasospotential(

ԡሺݔሻԡ ൌ

st function could beݔ In this ctfollowingt

ath finding ghost Froblirdquo are pointi

ationtoaccud by local

movemente

environmennction incorponagentdribes the teoptimalpa

olverthatca()iscalcula

ൌ ܨ

evaluated ie constructecontextwethepathof

in game-likin scares awing away d

uratelongͲtecollision avospeciallyina

ntisformulaporatesbotensitylargeravelͲtime tth

alculatestheatedsuchtha

(1)

in the direced by integrcan think oleastresista

ke scenarioway the littledirecting the

ermplanninoidance proareasofden

atedasacoh theachieeͲscaleobstato move fro

etotaltraveatitsatisfies

ction of therating the cof the globance

o The arrowe critters th

characters

5

gwithfullvoblem Togensecongestio

stfunctionvable speedaclesetc)foom one loc

elͲtime (potestheeikona

e gradient ost functional crowdmo

ws visualizeey scamper away from

55|P a g e

visibilityandether theseon

(sometimesd (basedonorlocationscation to a

ential) fromlequation

ሻݔሺ It isn along theovement as

e character away Note a potential

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

56|P a g e

By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows

1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring

KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof

KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist

NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure

Figure 3 Illustration of neighboring cells used in finite difference approximation

Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation

ቀఝಾఝಾቁଶቀఝಾఝಾ

ቁଶൌ ͳ (3)

Chapter 3 March of the Froblins

57|P a g e

Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows

1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf

thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire

tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3

toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor

thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem

AN

5

wrA

3SfwetdaOc

Fo3UahaTc

AdvancesinReNTatarchuk(E

58|P a g e

with2GBofregularenginA

3213Co

Solving theefunctionfor

where isequivalenttothecostreladifferentpatagentdensit

Once thescacalculateሺݔ

Figure 4 Lefof the blue m

322 Loc

Unfortunateavoideachohaveanaccuavoidancem

The basic gocollisions wi

ealͲTimeRendeEditor)

fRAM and aneandmem

onstructing

eikonalequtheenviron

the final cooslopeaswatedtodynathingdepenynearagoa

alarcost funሻThegradݔ

eft to Right Cmarker

calNavigat

lysolving totherwithacuratebehavmodelthatre

oal of a locith nearby a

eringin3DGra

anATIRademoryclocks

theCostFu

ation requirmentinthe

ሻݔሺܨ

ost functionwellasanylaamichazardsdingonthealtoprevent

nctionܨሺݔሻdientofሺݔሻ

Cost functio

tionandA

heeikonalecceptablefidiormodelfoesolvesthese

calmodel isagents and

aphicsandGam

eonTM4870HLSLsource

unction

resacost fuFroblinsdem

ሻ ൌ ሺݔሻ

n is the srgestaticobsandsituationFtovercrowd

isconstructሻiscalculate

n potential

Avoidance

equationatdelityisprohorourcharaefineͲgraine

s to providealso to nav

mesCoursendashS

graphics carecodeforo

unction thatmoiscompu

ሻݔሺܦ

staticmovebjectssuchaareweighForexampleingatgoals

ted (Figureedusingcen

gradient of

aresolutionhibitivelyexctersweauedobstacles

e each indivvigate arou

IGGRAPH2008

rdwith512uriterative

tcanbeevautedasfollow

ሻǡݔሺܣ

ement costasbuildings)htsthatcanitmaybed

4) itcanbentraldifferen

f potential T

nhighenouxpensiveforugmentour

vidual agentnd obstacle

8

2MBofGDDeikonalsolv

aluatedatews

(4)

(including tisthedebeadjusteddesiredtoin

esupplied tnces

The goals (so

ugh for largearealͲtimeglobaleikon

twith a veles and agen

DR5 videomverislistedi

achgridcel

terrainmovensityofagedspatiallytoncreasethe

to theeikon

ources) are a

enumbersoapplicationnalsolution

ocity thatwnts towards

memory andinAppendix

l Thecost

ement costntsandisoencouragecostdueto

alsolver to

at the center

ofagentstoInordertowithalocal

will preventits desired

Chapter 3 March of the Froblins

59|P a g e

destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

53|P a g e

31 BeyondPrettyPicturestowardIntelligentInteractiveExperiences Artificial intelligence(AI) isgenerallyconsideredtobeoneofthekeycomponentsofacomputergameSometimeswhenweplayagamewemaywish that thecomputeropponentswerewrittenbetterAtthose times while playing against the computer we feel that the game is unbalanced Perhaps thecomputerplayerhasbeengivendifferentsetof rulesoruses thesame rulesbuthasmore resources(healthweapons etc) The complexity of underlying AI systems alongwith game design belies theresulting feelingwehavewhenplayinganygameAs theCPUandGPUspeedandpowercontinues togrowalongwithincreasingmemoryamountsandbandwidthgamedevelopersareconstantlyimprovingthegraphicsof theirgames In the last fiveyears theproductionqualityofgameshasbeen increasing(alongwiththecorrespondingbudgets)RecentgameswooplayerswithincrediblebreakthroughsinrealͲtime3DgraphicscomplexityoftheworldsandcharactersaswellasvariouspostͲprocessingeffectsAndwhile there had been tremendous improvements for parallelizing rendering through the evolution ofconsumerGPUpipelinesartificialintelligencecomputationsaretreadingbehindTodatetherehadbeenratherfewattemptsatparallelizingAIcomputationsTypicallyinagameAIcontrolsthebehaviorofnonͲplayerͲcharacters(NPC)whethertheyarefriendlytotheplayeroractasgameopponentsThismay includeactualcharactersor itcansimplybetanksandarmies(suchasinarealͲtimestrategygame)ormonstersinafirstͲpersonshooterTheuniformfeelingisthebettertheAIisthebetterthegameAmoresophisticatedAIsystemallowsformoreinterestingandfungameplayArtificial intelligence isused forvariouspartsof thegameTypicalcomputations includepathfindingobstacleavoidanceanddecisionsmakingThesecalculationsareneededregardlessofthegenre of interactive entertainment be it a realͲtime strategy game anMMORPG or a firstͲpersonshooterWewillalsosoonseedynamiccharacterͲcentricentertainmentintheformofinteractivemovieswheretheviewerwillhavecontrolovertheoutcomeInmany scenarios the AI computations include dynamic path finding This involves autoͲsimulatingcharactersrsquobehaviorandorrunningaterrainanalysistoidentifygoodorupdatevalidpathsasresultofgameplayThesecomputationscanbequiteahogonCPUtimebudgeteveninmultiͲcorescenariosAsaresultmanygamedevelopersarelookingforwaystominimizetheCPUhitofpathfindingBecausepathfindingandAIingeneralissuchacomputeͲintensiveexpensivecalculationweoftenseeboringzombieͲlike NPC interaction Furthermore when gameplay and physics are simulated on the CPU and thecharactersare renderedon theGPU there isanadditionalPCIͲEdata transferoverhead for characterpositionsandstateIfwecanutilizetheGPUforrunninginͲgameAIcodenotonlywecanspeeduppathfindingbutwecanalsointroduceanumberofotherinterestingeffectsOurcharacterscanstartlivingontheirownresultinginsoͲcalledldquoemergentbehaviorsrdquondashsuchaslaneformationqueuingandreactionstoothercharactersandsoonAndthismeansthatgameplaywillbealotmorefunWith this inmindwe setout toexplore thenextvisual frontier for interactiveexperience combiningmassivelyparallelAIcomputationswithhighfidelityrenderingalgorithmsInthischapterwewilldescribethesimulationandrenderingmethodsusedfortheAMDFroblinsdemodesignedtoshowcasemanyofthe new approaches for characterͲcentric entertainment These techniques aremade possible by themassivelyparallelcomputeavailableonthelatestcommodityGPUssuchasATIRadeonregHD4800seriesIn our fantasy largeͲscale environment with thousands of highly detailed intelligent characters the

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

54|P a g e

Froblins (frog goblins) are concurrently simulated animated and rendered entirely on the GPU Theindividualcharacter logicforeachfroblincreature iscontrolledviaacomplexshader(over3200shaderinstructions)We are utilizing the latest functionality availablewith the DirectXreg 101 API hardwaretessellation high fidelity rendering with 4X MSAA settings at HD resolution with gammaͲcorrectrendering full high dynamic range FP16 pipeline and advanced postͲprocessing effects The crowdbehavior simulation is performed directly on theGPUWewill describe aGPUͲfriendly pathͲplanningframeworkforlargeͲscalecrowdsimulationThisframeworkcanalsobeusedtosimulatelargercrowdsofsimplifiedagentswithsmallerpolygonalcountOursystemhasbeenusedtosimulate65000agentsatrealͲtimeframeratesonasinglecommodityGPUBycombiningacontinuumͲbasedglobalpathplannerwithafineͲgrainedagentͲbased localavoidancemodelwecanperformexpensiveglobalplanningatacoarseresolutionand lowerupdateratewhile the localmodel takescareofavoidingotheragentsandnearby obstacles at a higher frequency To our knowledge this is the firstmassive crowd simulationperformedentirelyonaGPUInourinteractiveenvironmentwerenderthousandsofanimatedintelligentcharactersfromavarietyofviewpoints ranging from extreme closeͲups (with individual characters rendered at over 16 milliontriangles forcloseͲupdetail) to farawayldquobirdrsquoseyerdquoviewsof theentiresystem (over three thousandscharacters at the same time) Our system combines stateͲofͲtheͲart parallel artificial intelligencecomputationfordynamicpathfindingandlocalavoidanceontheGPUmassivecrowdrenderingwithLODmanagementwithhighendrenderingcapabilitiessuchasGPUtessellationforhighqualitycloseͲupsandstableperformance terrain systemcascaded shadows for largeͲrangeenvironmentsandanadvancedglobal illumination systemWe are able to render ourworld at interactive rates (over 20 fps on ATIRadeonregHD4870)withstaggeringpolygoncount(6ndash8milliontrianglesonaverageat20Ͳ25fps)whilemaintainingthefullhighqualitylightingandshadowingsolution

32 ArtificialIntelligenceonGPUforDynamicPathfinding321 GlobalPathfindingManysystemsforcrowdsimulationsrelyonagentͲbasedsolutionswherethemovementiscomputedforindividualagentsseparatelyWhiletherearecertainadvantagestothisapproach(independentdecisionsfor each agent individual visibility and environment information) it is also challenging to developbehavioralrules fortheagentsthatresult inaconsistentandrealisticoverallcrowdmovementAtthesame time scalingagentͲbasedmethods fora largenumberofagents isprohibitively computationallyexpensivewhichisaconcernforinteractivescenariossuchasvideogamesFor our crowd simulation solution we chose a continuumͲbased approach similar to the ContinuumCrowds work by Treuille et al [TCP06] Thismethod convertsmotion planning into an optimizationproblem usingwellͲknown numericalmethods from optics and general physics for stable navigationsolutionThis type ofmethod is particularly well suited for simulating large numbers of agents because it iscomputed spatially instead of perͲagent and results in smooth movement with no ldquodeadͲendsrdquoAdditionallyacontinuumapproachresults inflowͲlikemovementwhich ischaracteristicofactual large

C

cdmIrtinTa

wisc

Fmtt

Chapter 3 Ma

crowdsThedecision logmethodspro

nthiscontinreferred toaterrainslopen the envirneighboring

Thiscost funanylocation

where isܨ tntuitive toshortestͲpatcomputinga

Figure 2 Emovement dithat the arrothreat

arch of the Fr

globalmodgic and theoducesmoot

nuumͲbasedasa speedeetc)andaronment Thlocationand

nction isthetotheneare

the positivesee how thh from evepropagating

Example of irections Asows near th

oblins

elisonlyanerefore it isthandrealis

dcrowdsimufunction)Tavoidancefahis cost fundisusedtoe

enusedas iestgoalThi

eͲvalued coshe functionry locationgwavefront

dynamic pas the large ge ldquomonster

napproximaaugmented

sticcrowdm

ulationtheThis cost funactor(basednction descevaluatethe

nputtoasospotential(

ԡሺݔሻԡ ൌ

st function could beݔ In this ctfollowingt

ath finding ghost Froblirdquo are pointi

ationtoaccud by local

movemente

environmennction incorponagentdribes the teoptimalpa

olverthatca()iscalcula

ൌ ܨ

evaluated ie constructecontextwethepathof

in game-likin scares awing away d

uratelongͲtecollision avospeciallyina

ntisformulaporatesbotensitylargeravelͲtime tth

alculatestheatedsuchtha

(1)

in the direced by integrcan think oleastresista

ke scenarioway the littledirecting the

ermplanninoidance proareasofden

atedasacoh theachieeͲscaleobstato move fro

etotaltraveatitsatisfies

ction of therating the cof the globance

o The arrowe critters th

characters

5

gwithfullvoblem Togensecongestio

stfunctionvable speedaclesetc)foom one loc

elͲtime (potestheeikona

e gradient ost functional crowdmo

ws visualizeey scamper away from

55|P a g e

visibilityandether theseon

(sometimesd (basedonorlocationscation to a

ential) fromlequation

ሻݔሺ It isn along theovement as

e character away Note a potential

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

56|P a g e

By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows

1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring

KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof

KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist

NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure

Figure 3 Illustration of neighboring cells used in finite difference approximation

Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation

ቀఝಾఝಾቁଶቀఝಾఝಾ

ቁଶൌ ͳ (3)

Chapter 3 March of the Froblins

57|P a g e

Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows

1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf

thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire

tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3

toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor

thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem

AN

5

wrA

3SfwetdaOc

Fo3UahaTc

AdvancesinReNTatarchuk(E

58|P a g e

with2GBofregularenginA

3213Co

Solving theefunctionfor

where isequivalenttothecostreladifferentpatagentdensit

Once thescacalculateሺݔ

Figure 4 Lefof the blue m

322 Loc

Unfortunateavoideachohaveanaccuavoidancem

The basic gocollisions wi

ealͲTimeRendeEditor)

fRAM and aneandmem

onstructing

eikonalequtheenviron

the final cooslopeaswatedtodynathingdepenynearagoa

alarcost funሻThegradݔ

eft to Right Cmarker

calNavigat

lysolving totherwithacuratebehavmodelthatre

oal of a locith nearby a

eringin3DGra

anATIRademoryclocks

theCostFu

ation requirmentinthe

ሻݔሺܨ

ost functionwellasanylaamichazardsdingonthealtoprevent

nctionܨሺݔሻdientofሺݔሻ

Cost functio

tionandA

heeikonalecceptablefidiormodelfoesolvesthese

calmodel isagents and

aphicsandGam

eonTM4870HLSLsource

unction

resacost fuFroblinsdem

ሻ ൌ ሺݔሻ

n is the srgestaticobsandsituationFtovercrowd

isconstructሻiscalculate

n potential

Avoidance

equationatdelityisprohorourcharaefineͲgraine

s to providealso to nav

mesCoursendashS

graphics carecodeforo

unction thatmoiscompu

ሻݔሺܦ

staticmovebjectssuchaareweighForexampleingatgoals

ted (Figureedusingcen

gradient of

aresolutionhibitivelyexctersweauedobstacles

e each indivvigate arou

IGGRAPH2008

rdwith512uriterative

tcanbeevautedasfollow

ሻǡݔሺܣ

ement costasbuildings)htsthatcanitmaybed

4) itcanbentraldifferen

f potential T

nhighenouxpensiveforugmentour

vidual agentnd obstacle

8

2MBofGDDeikonalsolv

aluatedatews

(4)

(including tisthedebeadjusteddesiredtoin

esupplied tnces

The goals (so

ugh for largearealͲtimeglobaleikon

twith a veles and agen

DR5 videomverislistedi

achgridcel

terrainmovensityofagedspatiallytoncreasethe

to theeikon

ources) are a

enumbersoapplicationnalsolution

ocity thatwnts towards

memory andinAppendix

l Thecost

ement costntsandisoencouragecostdueto

alsolver to

at the center

ofagentstoInordertowithalocal

will preventits desired

Chapter 3 March of the Froblins

59|P a g e

destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

54|P a g e

Froblins (frog goblins) are concurrently simulated animated and rendered entirely on the GPU Theindividualcharacter logicforeachfroblincreature iscontrolledviaacomplexshader(over3200shaderinstructions)We are utilizing the latest functionality availablewith the DirectXreg 101 API hardwaretessellation high fidelity rendering with 4X MSAA settings at HD resolution with gammaͲcorrectrendering full high dynamic range FP16 pipeline and advanced postͲprocessing effects The crowdbehavior simulation is performed directly on theGPUWewill describe aGPUͲfriendly pathͲplanningframeworkforlargeͲscalecrowdsimulationThisframeworkcanalsobeusedtosimulatelargercrowdsofsimplifiedagentswithsmallerpolygonalcountOursystemhasbeenusedtosimulate65000agentsatrealͲtimeframeratesonasinglecommodityGPUBycombiningacontinuumͲbasedglobalpathplannerwithafineͲgrainedagentͲbased localavoidancemodelwecanperformexpensiveglobalplanningatacoarseresolutionand lowerupdateratewhile the localmodel takescareofavoidingotheragentsandnearby obstacles at a higher frequency To our knowledge this is the firstmassive crowd simulationperformedentirelyonaGPUInourinteractiveenvironmentwerenderthousandsofanimatedintelligentcharactersfromavarietyofviewpoints ranging from extreme closeͲups (with individual characters rendered at over 16 milliontriangles forcloseͲupdetail) to farawayldquobirdrsquoseyerdquoviewsof theentiresystem (over three thousandscharacters at the same time) Our system combines stateͲofͲtheͲart parallel artificial intelligencecomputationfordynamicpathfindingandlocalavoidanceontheGPUmassivecrowdrenderingwithLODmanagementwithhighendrenderingcapabilitiessuchasGPUtessellationforhighqualitycloseͲupsandstableperformance terrain systemcascaded shadows for largeͲrangeenvironmentsandanadvancedglobal illumination systemWe are able to render ourworld at interactive rates (over 20 fps on ATIRadeonregHD4870)withstaggeringpolygoncount(6ndash8milliontrianglesonaverageat20Ͳ25fps)whilemaintainingthefullhighqualitylightingandshadowingsolution

32 ArtificialIntelligenceonGPUforDynamicPathfinding321 GlobalPathfindingManysystemsforcrowdsimulationsrelyonagentͲbasedsolutionswherethemovementiscomputedforindividualagentsseparatelyWhiletherearecertainadvantagestothisapproach(independentdecisionsfor each agent individual visibility and environment information) it is also challenging to developbehavioralrules fortheagentsthatresult inaconsistentandrealisticoverallcrowdmovementAtthesame time scalingagentͲbasedmethods fora largenumberofagents isprohibitively computationallyexpensivewhichisaconcernforinteractivescenariossuchasvideogamesFor our crowd simulation solution we chose a continuumͲbased approach similar to the ContinuumCrowds work by Treuille et al [TCP06] Thismethod convertsmotion planning into an optimizationproblem usingwellͲknown numericalmethods from optics and general physics for stable navigationsolutionThis type ofmethod is particularly well suited for simulating large numbers of agents because it iscomputed spatially instead of perͲagent and results in smooth movement with no ldquodeadͲendsrdquoAdditionallyacontinuumapproachresults inflowͲlikemovementwhich ischaracteristicofactual large

C

cdmIrtinTa

wisc

Fmtt

Chapter 3 Ma

crowdsThedecision logmethodspro

nthiscontinreferred toaterrainslopen the envirneighboring

Thiscost funanylocation

where isܨ tntuitive toshortestͲpatcomputinga

Figure 2 Emovement dithat the arrothreat

arch of the Fr

globalmodgic and theoducesmoot

nuumͲbasedasa speedeetc)andaronment Thlocationand

nction isthetotheneare

the positivesee how thh from evepropagating

Example of irections Asows near th

oblins

elisonlyanerefore it isthandrealis

dcrowdsimufunction)Tavoidancefahis cost fundisusedtoe

enusedas iestgoalThi

eͲvalued coshe functionry locationgwavefront

dynamic pas the large ge ldquomonster

napproximaaugmented

sticcrowdm

ulationtheThis cost funactor(basednction descevaluatethe

nputtoasospotential(

ԡሺݔሻԡ ൌ

st function could beݔ In this ctfollowingt

ath finding ghost Froblirdquo are pointi

ationtoaccud by local

movemente

environmennction incorponagentdribes the teoptimalpa

olverthatca()iscalcula

ൌ ܨ

evaluated ie constructecontextwethepathof

in game-likin scares awing away d

uratelongͲtecollision avospeciallyina

ntisformulaporatesbotensitylargeravelͲtime tth

alculatestheatedsuchtha

(1)

in the direced by integrcan think oleastresista

ke scenarioway the littledirecting the

ermplanninoidance proareasofden

atedasacoh theachieeͲscaleobstato move fro

etotaltraveatitsatisfies

ction of therating the cof the globance

o The arrowe critters th

characters

5

gwithfullvoblem Togensecongestio

stfunctionvable speedaclesetc)foom one loc

elͲtime (potestheeikona

e gradient ost functional crowdmo

ws visualizeey scamper away from

55|P a g e

visibilityandether theseon

(sometimesd (basedonorlocationscation to a

ential) fromlequation

ሻݔሺ It isn along theovement as

e character away Note a potential

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

56|P a g e

By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows

1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring

KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof

KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist

NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure

Figure 3 Illustration of neighboring cells used in finite difference approximation

Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation

ቀఝಾఝಾቁଶቀఝಾఝಾ

ቁଶൌ ͳ (3)

Chapter 3 March of the Froblins

57|P a g e

Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows

1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf

thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire

tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3

toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor

thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem

AN

5

wrA

3SfwetdaOc

Fo3UahaTc

AdvancesinReNTatarchuk(E

58|P a g e

with2GBofregularenginA

3213Co

Solving theefunctionfor

where isequivalenttothecostreladifferentpatagentdensit

Once thescacalculateሺݔ

Figure 4 Lefof the blue m

322 Loc

Unfortunateavoideachohaveanaccuavoidancem

The basic gocollisions wi

ealͲTimeRendeEditor)

fRAM and aneandmem

onstructing

eikonalequtheenviron

the final cooslopeaswatedtodynathingdepenynearagoa

alarcost funሻThegradݔ

eft to Right Cmarker

calNavigat

lysolving totherwithacuratebehavmodelthatre

oal of a locith nearby a

eringin3DGra

anATIRademoryclocks

theCostFu

ation requirmentinthe

ሻݔሺܨ

ost functionwellasanylaamichazardsdingonthealtoprevent

nctionܨሺݔሻdientofሺݔሻ

Cost functio

tionandA

heeikonalecceptablefidiormodelfoesolvesthese

calmodel isagents and

aphicsandGam

eonTM4870HLSLsource

unction

resacost fuFroblinsdem

ሻ ൌ ሺݔሻ

n is the srgestaticobsandsituationFtovercrowd

isconstructሻiscalculate

n potential

Avoidance

equationatdelityisprohorourcharaefineͲgraine

s to providealso to nav

mesCoursendashS

graphics carecodeforo

unction thatmoiscompu

ሻݔሺܦ

staticmovebjectssuchaareweighForexampleingatgoals

ted (Figureedusingcen

gradient of

aresolutionhibitivelyexctersweauedobstacles

e each indivvigate arou

IGGRAPH2008

rdwith512uriterative

tcanbeevautedasfollow

ሻǡݔሺܣ

ement costasbuildings)htsthatcanitmaybed

4) itcanbentraldifferen

f potential T

nhighenouxpensiveforugmentour

vidual agentnd obstacle

8

2MBofGDDeikonalsolv

aluatedatews

(4)

(including tisthedebeadjusteddesiredtoin

esupplied tnces

The goals (so

ugh for largearealͲtimeglobaleikon

twith a veles and agen

DR5 videomverislistedi

achgridcel

terrainmovensityofagedspatiallytoncreasethe

to theeikon

ources) are a

enumbersoapplicationnalsolution

ocity thatwnts towards

memory andinAppendix

l Thecost

ement costntsandisoencouragecostdueto

alsolver to

at the center

ofagentstoInordertowithalocal

will preventits desired

Chapter 3 March of the Froblins

59|P a g e

destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

C

cdmIrtinTa

wisc

Fmtt

Chapter 3 Ma

crowdsThedecision logmethodspro

nthiscontinreferred toaterrainslopen the envirneighboring

Thiscost funanylocation

where isܨ tntuitive toshortestͲpatcomputinga

Figure 2 Emovement dithat the arrothreat

arch of the Fr

globalmodgic and theoducesmoot

nuumͲbasedasa speedeetc)andaronment Thlocationand

nction isthetotheneare

the positivesee how thh from evepropagating

Example of irections Asows near th

oblins

elisonlyanerefore it isthandrealis

dcrowdsimufunction)Tavoidancefahis cost fundisusedtoe

enusedas iestgoalThi

eͲvalued coshe functionry locationgwavefront

dynamic pas the large ge ldquomonster

napproximaaugmented

sticcrowdm

ulationtheThis cost funactor(basednction descevaluatethe

nputtoasospotential(

ԡሺݔሻԡ ൌ

st function could beݔ In this ctfollowingt

ath finding ghost Froblirdquo are pointi

ationtoaccud by local

movemente

environmennction incorponagentdribes the teoptimalpa

olverthatca()iscalcula

ൌ ܨ

evaluated ie constructecontextwethepathof

in game-likin scares awing away d

uratelongͲtecollision avospeciallyina

ntisformulaporatesbotensitylargeravelͲtime tth

alculatestheatedsuchtha

(1)

in the direced by integrcan think oleastresista

ke scenarioway the littledirecting the

ermplanninoidance proareasofden

atedasacoh theachieeͲscaleobstato move fro

etotaltraveatitsatisfies

ction of therating the cof the globance

o The arrowe critters th

characters

5

gwithfullvoblem Togensecongestio

stfunctionvable speedaclesetc)foom one loc

elͲtime (potestheeikona

e gradient ost functional crowdmo

ws visualizeey scamper away from

55|P a g e

visibilityandether theseon

(sometimesd (basedonorlocationscation to a

ential) fromlequation

ሻݔሺ It isn along theovement as

e character away Note a potential

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

56|P a g e

By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows

1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring

KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof

KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist

NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure

Figure 3 Illustration of neighboring cells used in finite difference approximation

Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation

ቀఝಾఝಾቁଶቀఝಾఝಾ

ቁଶൌ ͳ (3)

Chapter 3 March of the Froblins

57|P a g e

Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows

1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf

thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire

tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3

toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor

thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem

AN

5

wrA

3SfwetdaOc

Fo3UahaTc

AdvancesinReNTatarchuk(E

58|P a g e

with2GBofregularenginA

3213Co

Solving theefunctionfor

where isequivalenttothecostreladifferentpatagentdensit

Once thescacalculateሺݔ

Figure 4 Lefof the blue m

322 Loc

Unfortunateavoideachohaveanaccuavoidancem

The basic gocollisions wi

ealͲTimeRendeEditor)

fRAM and aneandmem

onstructing

eikonalequtheenviron

the final cooslopeaswatedtodynathingdepenynearagoa

alarcost funሻThegradݔ

eft to Right Cmarker

calNavigat

lysolving totherwithacuratebehavmodelthatre

oal of a locith nearby a

eringin3DGra

anATIRademoryclocks

theCostFu

ation requirmentinthe

ሻݔሺܨ

ost functionwellasanylaamichazardsdingonthealtoprevent

nctionܨሺݔሻdientofሺݔሻ

Cost functio

tionandA

heeikonalecceptablefidiormodelfoesolvesthese

calmodel isagents and

aphicsandGam

eonTM4870HLSLsource

unction

resacost fuFroblinsdem

ሻ ൌ ሺݔሻ

n is the srgestaticobsandsituationFtovercrowd

isconstructሻiscalculate

n potential

Avoidance

equationatdelityisprohorourcharaefineͲgraine

s to providealso to nav

mesCoursendashS

graphics carecodeforo

unction thatmoiscompu

ሻݔሺܦ

staticmovebjectssuchaareweighForexampleingatgoals

ted (Figureedusingcen

gradient of

aresolutionhibitivelyexctersweauedobstacles

e each indivvigate arou

IGGRAPH2008

rdwith512uriterative

tcanbeevautedasfollow

ሻǡݔሺܣ

ement costasbuildings)htsthatcanitmaybed

4) itcanbentraldifferen

f potential T

nhighenouxpensiveforugmentour

vidual agentnd obstacle

8

2MBofGDDeikonalsolv

aluatedatews

(4)

(including tisthedebeadjusteddesiredtoin

esupplied tnces

The goals (so

ugh for largearealͲtimeglobaleikon

twith a veles and agen

DR5 videomverislistedi

achgridcel

terrainmovensityofagedspatiallytoncreasethe

to theeikon

ources) are a

enumbersoapplicationnalsolution

ocity thatwnts towards

memory andinAppendix

l Thecost

ement costntsandisoencouragecostdueto

alsolver to

at the center

ofagentstoInordertowithalocal

will preventits desired

Chapter 3 March of the Froblins

59|P a g e

destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

56|P a g e

By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows

1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring

KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof

KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist

NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure

Figure 3 Illustration of neighboring cells used in finite difference approximation

Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation

ቀఝಾఝಾቁଶቀఝಾఝಾ

ቁଶൌ ͳ (3)

Chapter 3 March of the Froblins

57|P a g e

Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows

1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf

thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire

tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3

toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor

thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem

AN

5

wrA

3SfwetdaOc

Fo3UahaTc

AdvancesinReNTatarchuk(E

58|P a g e

with2GBofregularenginA

3213Co

Solving theefunctionfor

where isequivalenttothecostreladifferentpatagentdensit

Once thescacalculateሺݔ

Figure 4 Lefof the blue m

322 Loc

Unfortunateavoideachohaveanaccuavoidancem

The basic gocollisions wi

ealͲTimeRendeEditor)

fRAM and aneandmem

onstructing

eikonalequtheenviron

the final cooslopeaswatedtodynathingdepenynearagoa

alarcost funሻThegradݔ

eft to Right Cmarker

calNavigat

lysolving totherwithacuratebehavmodelthatre

oal of a locith nearby a

eringin3DGra

anATIRademoryclocks

theCostFu

ation requirmentinthe

ሻݔሺܨ

ost functionwellasanylaamichazardsdingonthealtoprevent

nctionܨሺݔሻdientofሺݔሻ

Cost functio

tionandA

heeikonalecceptablefidiormodelfoesolvesthese

calmodel isagents and

aphicsandGam

eonTM4870HLSLsource

unction

resacost fuFroblinsdem

ሻ ൌ ሺݔሻ

n is the srgestaticobsandsituationFtovercrowd

isconstructሻiscalculate

n potential

Avoidance

equationatdelityisprohorourcharaefineͲgraine

s to providealso to nav

mesCoursendashS

graphics carecodeforo

unction thatmoiscompu

ሻݔሺܦ

staticmovebjectssuchaareweighForexampleingatgoals

ted (Figureedusingcen

gradient of

aresolutionhibitivelyexctersweauedobstacles

e each indivvigate arou

IGGRAPH2008

rdwith512uriterative

tcanbeevautedasfollow

ሻǡݔሺܣ

ement costasbuildings)htsthatcanitmaybed

4) itcanbentraldifferen

f potential T

nhighenouxpensiveforugmentour

vidual agentnd obstacle

8

2MBofGDDeikonalsolv

aluatedatews

(4)

(including tisthedebeadjusteddesiredtoin

esupplied tnces

The goals (so

ugh for largearealͲtimeglobaleikon

twith a veles and agen

DR5 videomverislistedi

achgridcel

terrainmovensityofagedspatiallytoncreasethe

to theeikon

ources) are a

enumbersoapplicationnalsolution

ocity thatwnts towards

memory andinAppendix

l Thecost

ement costntsandisoencouragecostdueto

alsolver to

at the center

ofagentstoInordertowithalocal

will preventits desired

Chapter 3 March of the Froblins

59|P a g e

destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

57|P a g e

Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows

1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf

thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire

tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3

toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor

thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem

AN

5

wrA

3SfwetdaOc

Fo3UahaTc

AdvancesinReNTatarchuk(E

58|P a g e

with2GBofregularenginA

3213Co

Solving theefunctionfor

where isequivalenttothecostreladifferentpatagentdensit

Once thescacalculateሺݔ

Figure 4 Lefof the blue m

322 Loc

Unfortunateavoideachohaveanaccuavoidancem

The basic gocollisions wi

ealͲTimeRendeEditor)

fRAM and aneandmem

onstructing

eikonalequtheenviron

the final cooslopeaswatedtodynathingdepenynearagoa

alarcost funሻThegradݔ

eft to Right Cmarker

calNavigat

lysolving totherwithacuratebehavmodelthatre

oal of a locith nearby a

eringin3DGra

anATIRademoryclocks

theCostFu

ation requirmentinthe

ሻݔሺܨ

ost functionwellasanylaamichazardsdingonthealtoprevent

nctionܨሺݔሻdientofሺݔሻ

Cost functio

tionandA

heeikonalecceptablefidiormodelfoesolvesthese

calmodel isagents and

aphicsandGam

eonTM4870HLSLsource

unction

resacost fuFroblinsdem

ሻ ൌ ሺݔሻ

n is the srgestaticobsandsituationFtovercrowd

isconstructሻiscalculate

n potential

Avoidance

equationatdelityisprohorourcharaefineͲgraine

s to providealso to nav

mesCoursendashS

graphics carecodeforo

unction thatmoiscompu

ሻݔሺܦ

staticmovebjectssuchaareweighForexampleingatgoals

ted (Figureedusingcen

gradient of

aresolutionhibitivelyexctersweauedobstacles

e each indivvigate arou

IGGRAPH2008

rdwith512uriterative

tcanbeevautedasfollow

ሻǡݔሺܣ

ement costasbuildings)htsthatcanitmaybed

4) itcanbentraldifferen

f potential T

nhighenouxpensiveforugmentour

vidual agentnd obstacle

8

2MBofGDDeikonalsolv

aluatedatews

(4)

(including tisthedebeadjusteddesiredtoin

esupplied tnces

The goals (so

ugh for largearealͲtimeglobaleikon

twith a veles and agen

DR5 videomverislistedi

achgridcel

terrainmovensityofagedspatiallytoncreasethe

to theeikon

ources) are a

enumbersoapplicationnalsolution

ocity thatwnts towards

memory andinAppendix

l Thecost

ement costntsandisoencouragecostdueto

alsolver to

at the center

ofagentstoInordertowithalocal

will preventits desired

Chapter 3 March of the Froblins

59|P a g e

destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AN

5

wrA

3SfwetdaOc

Fo3UahaTc

AdvancesinReNTatarchuk(E

58|P a g e

with2GBofregularenginA

3213Co

Solving theefunctionfor

where isequivalenttothecostreladifferentpatagentdensit

Once thescacalculateሺݔ

Figure 4 Lefof the blue m

322 Loc

Unfortunateavoideachohaveanaccuavoidancem

The basic gocollisions wi

ealͲTimeRendeEditor)

fRAM and aneandmem

onstructing

eikonalequtheenviron

the final cooslopeaswatedtodynathingdepenynearagoa

alarcost funሻThegradݔ

eft to Right Cmarker

calNavigat

lysolving totherwithacuratebehavmodelthatre

oal of a locith nearby a

eringin3DGra

anATIRademoryclocks

theCostFu

ation requirmentinthe

ሻݔሺܨ

ost functionwellasanylaamichazardsdingonthealtoprevent

nctionܨሺݔሻdientofሺݔሻ

Cost functio

tionandA

heeikonalecceptablefidiormodelfoesolvesthese

calmodel isagents and

aphicsandGam

eonTM4870HLSLsource

unction

resacost fuFroblinsdem

ሻ ൌ ሺݔሻ

n is the srgestaticobsandsituationFtovercrowd

isconstructሻiscalculate

n potential

Avoidance

equationatdelityisprohorourcharaefineͲgraine

s to providealso to nav

mesCoursendashS

graphics carecodeforo

unction thatmoiscompu

ሻݔሺܦ

staticmovebjectssuchaareweighForexampleingatgoals

ted (Figureedusingcen

gradient of

aresolutionhibitivelyexctersweauedobstacles

e each indivvigate arou

IGGRAPH2008

rdwith512uriterative

tcanbeevautedasfollow

ሻǡݔሺܣ

ement costasbuildings)htsthatcanitmaybed

4) itcanbentraldifferen

f potential T

nhighenouxpensiveforugmentour

vidual agentnd obstacle

8

2MBofGDDeikonalsolv

aluatedatews

(4)

(including tisthedebeadjusteddesiredtoin

esupplied tnces

The goals (so

ugh for largearealͲtimeglobaleikon

twith a veles and agen

DR5 videomverislistedi

achgridcel

terrainmovensityofagedspatiallytoncreasethe

to theeikon

ources) are a

enumbersoapplicationnalsolution

ocity thatwnts towards

memory andinAppendix

l Thecost

ement costntsandisoencouragecostdueto

alsolver to

at the center

ofagentstoInordertowithalocal

will preventits desired

Chapter 3 March of the Froblins

59|P a g e

destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

59|P a g e

destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AN

6

FbWIccpbatgbsdsoFdttlstdpaAdPstt

AdvancesinReNTatarchuk(E

60|P a g e

Figure 5 Agbin These ID

WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ

For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti

After vertexdepthvaluePointsthatashaderissettheendoftthis iteration

ealͲTimeRendeEditor)

gent positionDs are later

yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass

ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s

rwhichresurmingtheldquogngclipkillin

shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith

eringin3DGra

ns are rasterused to retri

eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes

nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi

ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage

aphicsandGam

rized into a ieve agent po

ersto0toifirstiteratioCounter is

as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou

mainsettotstIDwillget

gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels

assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha

mesCoursendashS

bin counterositions and

ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand

second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda

geometry shstobothbediscardedn

erwillbesetbufferwillcavenotyet

IGGRAPH2008

r and an arrad velocities

tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen

ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange

TT01]we arthatisgreateaderratherallowstheG

hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne

8

ay containin

reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere

gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo

geometry srasterizeraedandnotsationswhereheagentsthd The stre

ng IDs of ag

dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe

ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu

hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf

gents in that

oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed

the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling

thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

61|P a g e

containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AN

6

3AEssmtftcTf

wadt

FasTtd

AdvancesinReNTatarchuk(E

62|P a g e

3223Ag

AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa

The updatedfunctionresu

ݐ ൌ

whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi

Figure 6 Eaand velocitieshown

Twoexampltwo sets aredirectiongi

ealͲTimeRendeEditor)

gentMove

ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc

d velocity (Eult(Equation

ሻሺݏݏ ൌ

ൌ ఢ

ൌ పෝ

aperͲagentሻreturnstheoevaluatencethelast

ach agent eves of agents

esetsofdire identicalWehaveal

eringin3DGra

mentDirec

tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft

Equation 7)n6)andthe

ൌݓݐሺሻ

ሺݏݏݐ ሺݏǡ పෝሺݐݏ

factoraffecteminimumtistheglobsimulationf

valuates a fin its curre

rectionstobin the locasochosent

aphicsandGam

ctionDeterm

asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents

is then casmallesttim

ሺ ȉ

పሻ Τݐ ሻ

tingthepreftimetocollisbalnavigatioframe

fixed numberent and adja

beevaluatedl coordinatetoevaluate

mesCoursendashS

mination

ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir

lculated basmetocollisio

ሻǤ ͷ Ǥͷ

ferencetomsionwithallondirection

r of potentia

acent bins T

dforvelocitye frame defonlydirectio

IGGRAPH2008

ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision

sed on theoninthatdir

moveinthegagentsindiisthespeݏ

al movemenTwo agents a

yupdatearfined by thonsthatwo

8

lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh

direction wrection

globaldirectirectioneedofagent

t directions and their co

eshownine agent andouldcausea

modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad

with the larg

(5)

(6)

(7)

tionoravoidisthesetotandݐ

based on thorresponding

Figure6Nod its globalldquorightturn

mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach

gest fitness

dnearbyfdiscreteistheݐ

he positions g V sets are

otethatthe navigationrdquochange in

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

63|P a g e

orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol

324PathfindingResultsDiscussion

Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

64|P a g e

WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents

33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

65|P a g e

novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU

331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1

struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )

Listing 1 A stream-filtering geometry shader

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

66|P a g e

332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2

struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o

Listing 2 Vertex shader for view-frustum culling

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

67|P a g e

333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

68|P a g e

struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM

Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

69|P a g e

betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula

ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)

HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ

ௗ ୲ୟ୬ቀഇమቁ (9)

whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat

ቀௐଶቁ ͳ (10)

wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

70|P a g e

ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

71|P a g e

float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o

Listing 4 Vertex shader for per-instance occlusion culling

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

72|P a g e

335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands

RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )

Listing 5 Summary of our character management system

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

C

3 TcsftctmOdttbaipo

FccoadTmaofobt

Chapter 3 Ma

34 Cha

The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea

Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation

Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu

The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon

arch of the Fr

racterAn

nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa

canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in

xample of difgic shader

e and desired(b) User pl

we see the cushrooms (d)

of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence

oblins

nimation

h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto

ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp

ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe

mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes

ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU

redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th

ons that our determines tom the left ious poison ng away fro

eaceful restin

is illustratewherethehober isusedtouse the teeanimationweight annotableperfo

med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour

ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor

Froblins cathe current a(a) Froblin cloud in the

om the hazarng restores t

ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai

dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes

kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res

an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo

e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto

s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr

miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi

These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri

ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic

7

ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou

c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p

ns are controsed on each ed treasure tas a result bout to munts

ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo

73|P a g e

mationsandonbyvertexkthebonesfcharacters

merousdrawnd33)theursystemby

fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and

olled by the characterrsquos to the drop-they scatter

nch on some

ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

74|P a g e

Figure 8 Animation texture layout

Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss

float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering

Bone 1

Matrix Row 0

Matrix Row 1

Matrix Row 2

Bone 0

Matrix Row 0

Matrix Row 1

Matrix Row 2

Key 0 Key1 Key N

RGBA FP16

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

75|P a g e

void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)

Listing 6 Shader code to fetch interpolate and blend bone animations

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AN

7

3

F(st

Rsat(stfs

T

AdvancesinReNTatarchuk(E

76|P a g e

35 Tess

Figure 9 Ou(left) On theshaders and the tessellate

Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder

FroblLowr

Zbrushreghighr

Table 1 Com

ealͲTimeRendeEditor)

sellation

ur system alle right the textures W

ed character

erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof

lincontrolcagesolutionmod

resolutionFro

mparison of

eringin3DGra

andCrow

lows rendersame chara

While using thr on the left

GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste

edel

blinmodel

memory foo

aphicsandGam

wdRende

ing characteacter is rendhe same memwhereas the

cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo

tprint for hig

mesCoursendashS

ering

ers with extrdered withomory footprie low resolut

asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke

Polygons

5160faces

15+Mfaces

gh and low r

IGGRAPH2008

reme details ut the use oint we are ation characte

60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka

resolution m

8

in close-up of tessellatioable to add er has much

RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf

Vertexa

2Ktimes2K16b

Vert

Ind

models for ou

when using on using idehigh level ofcoarser silh

D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption

TotalMemory

ndindexbuffe

itdisplacemen

texbuffer~27

dexBuffer180

ur main char

tessellationentical pixel f details for

houettes

0and4000fied shaderdhardwareirect3Dreg11universally

ssellationasseauthoredormanceA

y

ers100KB

ntmap10MB

70MB

0MB

racter

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

C

Fc

Chapter 3 Ma

Figure 10 Ccharacter

arch of the Fr

Comparison

oblins

of low resolution modeel (top) and hhigh resoluttion model (

7

(bottom) for

77|P a g e

the Froblin

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AN

7

Hvtcodwotddc[

3

Ihho

F(vd

Gt1g

AdvancesinReNTatarchuk(E

78|P a g e

Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08

351 GPU

n this sectiohardwareashardware fixoutlinedinF

Figure 11 A(also referrevertices thusdisplacemen

Goingthrougtakesaninp15X times fgenerationo

CoarseS

ealͲTimeRendeEditor)

essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]

UTessellati

onwewillpsusedinouxedͲfunctionigure11

An overviewed to as the s amplifyingt obtaining

ghtheproceutprimitiveor Xboxreg 3ofgraphicsc

SuperͲPrimMesh

eringin3DGra

provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional

ionPipelin

provideanorsystemWn tessellator

w of the tesseldquocontrol cagg the input mthe final tess

essforasing(whichwe60 or ATI Rcards)Aver

Tessella

aphicsandGam

veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of

ne

overviewofWedesigned

unitavailab

ellation procgerdquo or ldquothe smesh The vesellated and

gleinputpolrefertoasaRadeonreg HDrtexshader

ator

mesCoursendashS

nefits speciis effective

ologyparamowamountorenderingooverallamodisplaceme

owsustoreddvideomemhe memoryrce Table 1f the benef

GPU tesselanAPIforableonrecen

cess We stasuper-primitertex shader

d displaced h

ygonwehaasuperͲprimD 2000Ͳ4000(whichwer

Tessellated Mesh

IGGRAPH2008

fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are

1demonstrafits provided

lationpipeliaGPUtesselntconsumer

art by rendertive meshrdquo)r is used to high resolutio

avethefollomitive)anda0 GPU generefertoasa

h

Di

8

al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell

ineavailablelationpipelirGPUsThe

ring a coars The tessellaevaluate suron mesh see

wingthehaamplifiesiterations ornevaluation

Evaluatesurfacepositions

Addsplacement

active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw

ducingtheoy relevant fy savings foation can b

eoncurreninetakingade tessellation

se low resoator unit genrface position on the righ

ardwaretess(upto411t64X for Di

nshader)is

TessellatedanMes

ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor

widthThisisverallgamefor consoleorourmainbe found in

tconsumerdvantageofnprocess is

lution mesh nerates new ons and add ht

sellatorunittrianglesorrect3Dreg 11invokedfor

dDisplacedsh

d

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

79|P a g e

eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

80|P a g e

struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

C

L GTHsc

Fdb Hnawddeae

Chapter 3 Ma

f f f f o o o o o r

Listing 7 Ex

GiventhatwTraditionallyHoweverthspacenormachangingthe

Figure 12 Ddisplaced in be shaded us

Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese

arch of the Fr

Convert po translatin somewhat f

float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor

ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS

return o

xample simp

wearerendey animatedereexistsaalmapsInteactualdisp

Displacementhe directio

sing normal

e intend toordertocommely thatwo generatentandnormantmapmustthe tangen

MDGPUMeserequireme

oblins

osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota

S = mul( mVPS = float4( v = vNormalWS = vTangentW

S = vBinormal

le evaluation

eringourchcharactersconcernwithatcasewelacednorma

nt of the veon of geometԢ

light thedismbineTSNMwearecomthe displa

almapsalsotbe generantͲspace norhMappertontsaremet

tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir

float4( vPovPositionWS S WS lWS

n shader for

aracterwithare rendehregardstoeareessental(asshown

rtex modifietric normal

splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr

e from objectrotating abo

vPositionOS vNormalOS )vTangentOS )vBinormalOS

ositionWS 1 1 )

r rendering i

hdisplacemered and litousingdisplatiallygeneratninFigure12

es the norma displacem

faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour

t space to woout y we can

) + vInstanc ) )

) )

nstanced tes

entmapweusing tang

acementmatinganewt2)

al used for ment amount

angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor

orld space byn simplify th

cePosWS

ssellated cha

emustmakegentͲspaceappingwhentangentfram

rendering

t D The resu

cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa

8

y rotating anhe math

aracters

eanoteabonormal manlightingusimeasweare

P is the oriulting point

mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat

81|P a g e

nd

outlightingps (TSNM)ngtangentͲedisplacing

iginal point Prsquo needs to

heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan

t

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

82|P a g e

353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows

ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ

HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive

353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

83|P a g e

vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

84|P a g e

Figure 13 Data format used for compressed animated vertices

Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots

X (16)

Y (16)

Z (16)

Nx (10)

Ny (11)

Tș (8)

Tij (8)

U (16)

V (16)

32 Bits

R

G

B

A

Nz (11)

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

85|P a g e

Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput

Listing 8 Compression code for vertex format given in Figure 13

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

86|P a g e

float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert

Listing 9 Decompression code for vertex format given in Figure 13

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

87|P a g e

354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

88|P a g e

Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization

36 LightingandShadowing

361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification

Displacement map Rendered character using the displacement map with a seam

Zoomed-in view shows a crack in the surface

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

C

3

F[cIdp[Semplaubd3Omdmsatg

Chapter 3 Ma

362Ligh

Figure 15 A[left] the fulcheckerboard

nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha

3621SH

Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly

arch of the Fr

hting

A medium relly shaded ted pattern ov

onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado

HLMGenera

rsceneiscoadirectionageenvironmr desired ligtakenata

elyhalfofouwouldworkcaptureap

oblins

esolution sperrain [centverlay to indi

bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap

ation

mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti

herical harmter] just the icate light m

ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai

twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm

monic light lighting fro

map texel den

usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows

oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar

map is usedom the sphernsity

AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest

plingalsoenceneelemenrgeneratingchniquefordouble shad

mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su

d to light a hrical harmon

sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh

nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti

primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha

8

highly detainic light map

edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg

itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints

89|P a g e

iled terrain ap [right] a

snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere

sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AN

9

tstpc

FsnhaseAdtmrp

3Fsulfrcsn

AdvancesinReNTatarchuk(E

90|P a g e

terrainthatsamplestakethiscan leadpracticewecompletelig

Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment

Ateach samdirect light ftesting foromodestrecuradianceistpointtexture

3622Re

For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu

ealͲTimeRendeEditor)

mayrequireentoofarofdtomissingfoundthathtingenviro

The lighting n on the terr

for shadingof the lightin

ely half of oracters as wthat capture

mplepointdfrom the suocclusion Inursion limithenstoredesusingthe

enderingUs

ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues

eringin3DGra

ethemissinffthegroungorunnaturtakingsamp

onmentsthat

environmenrain surfaceg characterng environmour charactewell as the es bounced l

directand inn is computdirect lightinalldirectusing3rdordOpenEXRim

singSHLM

k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can

aphicsandGam

g lowerhemdcancreateralterrainshplesatamotwereusefu

nt is capturee does not cors that may

ment [Centerersrsquo height terrain itse

lighting from

ndirect lighttedby castifromthesutionsonthedersphericamagefilefor

der spectraonschemes[ound that thsofhighconlitylighting

mpressedSHLonenttexturnnot be sto

mesCoursendashS

misphereofeartifactswhadowbounoderateheigulforshadin

ed at a poinontain usefu

y have shadr] A sample ensuring th

elf [Right] Am the ground

t iscapturedingadistribunand fromsphereusialharmonicsmat

l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this

IGGRAPH2008

the lightingwhenusedfondariesaswghtabovethgboththet

nt by firing rul lighting dading normals

point is offshe captured A characterd below

dandprojecbutionof raythesky iscingastratifisThisdata

a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau

8

genvironmeorshadingthwellas incorheterrainwterrainandt

rays into theata in the los that poin

fset from the lighting env

r requires a

cted into spys in thedicollectedbyiedsamplingiswrittento

n RGBA16Fourmemoryficientsgaveboundaries

werresolutioattheDCcominimallossse the form

entOntheoheterrainInrectselfͲref

wewereablehecharacte

e environmeower hemispt down intoterrain at avironment is full spheric

phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ

texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh

mat does no

otherhandnparticularlectance Inetocapturers

ent [Left] A phere and is o the lower a distance of s useful for cal lighting

monicsThehe sunandrayswithaThe incidentͲbitfloating

Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for

f

g

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

C

FlldquotSarttsddOwdtete

Fmos

Chapter 3 Ma

Final lightingight from thldquoresidualrdquoenthelistingat

Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon

Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment

Figure 17 mountain thoccluded regshadowing a

arch of the Fr

g is computhe linear tenvironmenttheendoft

aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting

ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee

Characters he right halfgion of the

artifacts

oblins

ed inapixerms then slighting [SLOthissection

dynamicthͲtime shado

WedidnotweshadowingshowninFihe lightmaelightingenadominant

ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample

cast shadowlf is in diree terrain [B

el shaderbysumming theOAN08]Dire

heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma

t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid

ws on the teect sun lightBottom] A

y sampling te contributiectXregHLSLs

scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh

dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee

errain The t [Top] Chshadow cor

theSHLM ron of this dshadercode

bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh

ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse

left half ofharacters inrrection fac

removing thdominant diedemonstra

theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron

ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection

f the image correctly ca

ctor is appl

9

edominantirectional ligating this is

sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment

mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo

is in the shast double slied to prev

91|P a g e

directionalght and theprovided in

essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora

mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting

hadow of a shadows on vent double

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

92|P a g e

WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

C

Fe

Chapter 3 Ma

Figure 18 Denvironment

arch of the Fr

Dynamic chafor shading

oblins

aracters (bog by sampling

ttom) and otg from the te

ther static scerrainrsquos SHL

cene props (LM

(top) build a

9

an approxim

93|P a g e

mate lighting g

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

94|P a g e

Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

95|P a g e

out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

96|P a g e

smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)

Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

C

3 IcmcierWeamfeit

3 WSpaaHbedB

Chapter 3 Ma

37 Con

n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson

38 Ack

WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP

arch of the Fr

nclusion

terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2

nowledg

ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor

oblins

ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w

ements

allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork

s aspects ofallelization apath findingn our large(frog goblin

democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta

ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool

f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful

orkingandoves a speciad artists froandwersquodli

MDGameCoasAMD graTimKelleya tool speandwersquodp

g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit

overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l

ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan

olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than

9

crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou

on count (6ndshadowin

ntributedtopatience diland Exigenteircontribu

AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh

97|P a g e

utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million

ngsolution

othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

98|P a g e

39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters

Ltd[AMDGMM08]AMDGPUMeshMapper2008

httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ

RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA

Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity

ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93

Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238

[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley

[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel

SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ

JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010

[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008

[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006

[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)

[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008

[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA

[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538

[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

99|P a g e

[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames

[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008

[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)

100|P a g e

AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE

Listing 11 HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver

Chapter 3 March of the Froblins

101|P a g e

====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi

Listing 11 (cont) HLSL code for iterative eikonal solver